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Abstract 

We give a novel spectral approximation algorithm for the balanced separator problem that, 
given a graph G, a constant balance b E (0,1/2], and a parameter 7, either finds an Ci{b)- 
balanced cut of conductance O(y^) in G, or outputs a certificate that all b-balanced cuts in 
G have conductance at least 7, and runs in time 0{m). This settles the question of design- 
ing asymptotically optimal spectral algorithms for balanced separator. Our algorithm relies 
on a variant of the heat kernel random walk and requires, as a subroutine, an algorithm to 
compute exp(— L)i; where L is the Laplacian of a graph related to G and c is a vector. Algo- 
rithms for computing the matrix-exponential-vector product efficiently comprise our next set 
of results. Our main result here is a new algorithm which computes a good approximation to 
exp{—A)v for a class of symmetric positive semidefinite (PSD) matrices A and a given vector 
n, in time roughly 0{m^), where ffz^ is the number of non-zero entries of A. This uses, in a 
non-trivial way, the breakthrough result of Spielman and Teng on inverting symmetric and 
diagonally-dominant matrices in 0(;«^) time. Finally, we prove that e^^ can be uniformly ap- 
proximated up to a small additive error, in a non-negative interval [a, b] with a polynomial of 
degree roughly \^b — a. While this result is of independent interest in approximation theory, 
we show that, via the Lanczos method from numerical analysis, it yields a simple algorithm 
to compute exp(— A)u for symmetric PSD matrices that runs in time roughly 0(t^ ■ 
where t/^ is time required for the computation of the vector Aw for given vector w. As an ap- 
plication, we obtain a simple and practical algorithm, with output conductance 0{.J^), for 
balanced separator that runs in time 0{"^/^). This latter algorithm matches the running time, 
but improves on the approximation guarantee of the Evolving-Sets-based algorithm by Ander- 
sen and Peres for balanced separator. 
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1 Introduction and Our Results 



1.1 Balanced Separator 

The Balanced Separator problem (BS) asks the following decision question: given an un- 
weighted graph G = {V,E), V = [n], \E\ = m, a constant balance parameter b G (0,1/2], and 
a target conductance value 7 G (0,1), does G have a ^-balanced cut S such that (p{S) < 7? 
Here, the conductance of a cut (S, S) is defined to be (^(S) = |E(S'S)l/mm{voi(s),voi(S)}, where vol(S) 
is the sum of the degrees of the vertices in the set S. Moreover, a cut (S, S) is b-balanced if 
min{vol(S), vol(S)} > b ■ vo\{V). This is a classic NP-hard problem and a central object of study 
for the development of approximation algorithms, both in theory and in practice. On the theoreti- 
cal side, BS has far reaching connections to spectral graph theory, the study of random walks and 
metric embeddings. In practice, algorithms for BS play a crucial role in the design of recursive 
algorithms Il35ll . clustering Ill9l and scientific computation 132) . 

Spectral methods are an important set of techniques in the design of graph-partitioning al- 
gorithms and are fundamentally based on the study of the behavior of random walks over the 
instance graph. Spectral algorithms tend to be conceptually appealing, because of the intuition 
based on the underlying diffusion process, and easy to implement, as many of the primitives 
required, such as eigenvector computation, already appear in highly-optimized software pack- 
ages. The most important spectral algorithm for graph partitioning is the Laplacian Eigenvector 
(LE) algorithm of Alon and Milman [2], which, given a graph of conductance at most 7, outputs 
a cut of conductance at most 0{^/j), an approximation guarantee that is asymptotically opti- 
mal for spectral algorithms. A consequence of the seminal work of Spielman and Teng ||37| is 
that the LE algorithm can run in time 0{m) using the Spielman-Teng solver. Hence, LE is an 
asymptotically optimal spectral algorithm for the minimum-conductance problem, both for run- 
ning time (up to polylog factors) and approximation quality. In this paper, we present a simple 
random-walk-based algorithm that is the first such asymptotically optimal spectral algorithm for 
BS. Our algorithm can be seen as an analogue to the LE algorithm for the balanced version of the 
minimum-conductance problem and settles the question of designing spectral algorithms for BS. 
The following is our main theorem on graph partitioning. 

Theorem 1.1 (Spectral Algorithm for Balanced Separator) Given an unweighted graph G = {V, E), 
a balance parameter b G (0,1/2], b = 0(1) and a conductance value 7 G (0,1), we give an al- 
gorithm called BalSep(G, b, 7), that either outputs an Cl{b) -balanced cut S C V such that (p{S) < 
0{^/j), or outputs a certificate that no b-balanced cut of conductance 7 exists. BalSep runs in time 
0{m poly(logn)). 

The algorithm for Theorem I 1 . 1 I relies on our ability to compute the product of the matrix-exponential 
of a matrix and an arbitrary vector in time essentially proportional to the sparsity of the matrix. 
Our contribution to the problem of computing the matrix-exponential-vector product appear in 
detail in Section [L21 The algorithm required for Theorem 11.11 runs in time 0{m) and, notably, 
makes use of the Spielman-Teng solver in a non-trivial way. We also prove an alternative novel 
result on how to perform this matrix-exponential computation, which relies just on matrix-vector 
products. This result, when combined with our BS algorithm based on random walks, yields a 
theorem identical to Theorem I 1 . 1 1 except that the running time now increases to 0("V ^), see The- 
orem 13.11 However, this latter algorithm not only turns out to be almost as simple and practical 
as the LE algorithm, but it also improves in the approximation factor upon the result of Andersen 
and Peres 111 who obtain the same running time using Evolving-Sets-based random walk. 
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1.1.1 Comparison to Previous Work on Balanced Separator 



The best known approximation for BS is 0{^y\ogn) achieved by the seminal work of Arora, Rao 
and Vazirani |8| that combines semidefinite programming (SDP) and flow ideas. A rich line 
of research has centered on reducing the running time of this algorithm using SDP and flow 
ideas Il20l [7ll24|. This effort culminated in Sherman's work ||33| , which brings down the required 
running time to 0{n^) s-t maximum-flow computations. ^ However, these algorithms are based on 
advanced theoretical ideas that are not easy to implement or even capture in a principled heuristic. 
Moreover, they fail to achieve a nearly-linear ^ running time, which is crucial in many of today's 
applications that involve very large graphs. To address these issues, researchers have focused 
on the design of simple, nearly-linear-time algorithms for BS based on spectral techniques. The 
simplest spectral algorithm for BS is the Recursive Laplacian Eigenvector (RLE) algorithm (see, 
for example, [19]). This algorithm iteratively uses LE to remove low-conductance unbalanced 
cuts from G, until a balanced cut or an induced 7-expander is found. The running time of the 
RLE algorithm is quadratic in the worst case, as Q(n) unbalanced cuts may be found, each re- 
quiring a global computation of the eigenvector. Spielman and Teng |[36ll were the first to design 
nearly-linear-time algorithms outputting an n(fc)-balanced cut of conductance 0(^/7 polylogn), 
if a i7-balanced cut of conductance less than 7 exists. Their algorithmic approach is based on lo- 
cal random walks, which are used to remove unbalanced cuts in time proportional to the size of 
the cut removed, hence avoiding the quadratic dependence of RLE. Using similar ideas, Ander- 
sen, Chung and Lang ||4j, and Andersen and Peres lU improved the approximation guarantee 
to 0(1^7 logn) and the running time to 0{"'/y^). More recently, Orecchia and Vishnoi (OV) Il25ll 
employed an SDP formulation of the problem, together with the Matrix Multiplicative Weight Up- 
date (MMWU) of liZJ and a new SDP rounding, to obtain an output conductance of O ( ^/j) with 
running time 0{'"/'y^), effectively removing unbalanced cuts in ©(log"/?) iterations. In Section |5^ 
we give a more detailed comparison with OV and discussion of our novel width-reduction tech- 
niques from an optimization point of view. Finally, our algorithm should also be compared to the 
remarkable results of Madry | |22| for BS, which build up on Racke's work ESI and on the low- 
stretch spanning trees from Abraham et al. [ll, to achieve a trade-off between running time and 
approximation. For every integer > 1, he achieves roughly 0{{\o^n)^) approximation in time 
0{m + 2^ • n^+^ *"). Calculations show that for 7 > 2^(^°8iog")^^ our algorithm achieves strictly 
better running time and approximation than Madry's for sparse graphs.'^ More importantly, we 
believe that our algorithm is significantly simpler, especially in its second form mentioned above, 
and likely to find applications in practical settings. 

1.2 The Matrix Exponential, the Lanczos Method and Approximations to 

We first state a few definitions used in this section. We will work with n x n, symmetric and 
positive semi-definite (PSD) matrices over R. For a matrix M, abusing notation, we denote its 

exponential by exp(— M), or by e^^, and define it as l],>o ■^p"^'- ^ is said to be Symmetric and 
Diagonally Dominant (SDD) if, Mjy = Mji, for all i,j and Ma > YLj l-^yl/ for all i. Let m^ denote 
the number of non-zero entries in M and let t^ denote the time required to multiply the matrix 

^Even though the results of (8) and l33l are stated for the Sparsest Cut problem, the same techniques apply to the 
conductance problem, e.g. by modifying the underlying flow problems. See for example (S). 
^Following the convention of |38| , we denote by nearly-linear a running time of 0('»/poly{7)). 

^In the table on Page 4 of the full version of Madry's paper, it has been erroneously claimed that using the Spielman- 
Teng solver, Alon-Milman algorithm runs in time 0(m) for BS. 
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M with a given vector v. In general, depends on how M is given as an input and can bee(n2). 
However, it is possible to exploit the special structure of M if given as an input appropriately: It is 
possible to just multiply the non-zero entries of M, giving tM = 0{mM_)- Also, if M is a rank one 
matrix iviv^ , where w is known, we can multiply with M in 0(n) time. We move on to our results. 

At the core of our algorithm for BS, and more generally of most MMWU based algorithms, lies 
an algorithm to quickly compute exp{—A)v for a PSD matrix A and a unit vector v. It is sufficient 
to compute an approximation u, to exp(— A)t;, in time which is as close as possible to tj\. It can 
be shown that using about |j A|| terms in the Taylor series expansion of exp(— A), one can find a 
vector u that approximates exp(— Hence, this method runs in time roughly 0(f^ • ||A|j). In 
our application, and certain others [Zl HH UHl [HI, this dependence on the norm is prohibitively 
large. The following remarkable result was cited in Kale IfTSf . 

Hypothesis. Let A >z and e > 0. There is an algorithm that requires O {lo^ ^/e^ iterations to find a 
vector u such that |jexp(— — u\\ < ||exp(— e,/or any unit vector v. The time for every iteration 
isO{tA). 

This hypothesis would suffice to prove Theorem ll.il But, to the best of our knowledge, there 
is no known proof of this result. In fact, the source of this unproved hypothesis can be traced to 
a paper of Eshof and Hochbruck (EH) IfTSl . EH suggest that one may use the Lanczos method 
(described later), and combine it with a rational approximation for e^^ due to Saff, Schonhage and 
Varga Il30]| , to reduce the computation of exp(— A)y to a number of (7 + (x.A)^^v computations for 
some a > 0. Note that this is insufficient to prove the hypothesis above as there is no known way 
to compute (7 + aA)^'^v in time 0{tA)- They note this and propose the use of iterative methods to 
do this computation. They also point out that this will only result in an approximate solution to 
( / + OiA)^^v and make no attempt to analyze the running time or the error of their method when 
the inverse computation is approximate. We believe that we are quite distant from proving the 
hypothesis for all PSD matrices and, moreover, that proving such a result may provide valuable 
insights into a fast (approximate) inversion method for symmetric PSD matrices, an extremely 
important open problem. 

A significant part of this paper is devoted to a proof of the above hypothesis for a class of 
PSD matrices that turns out to be sufficient for the BS application. For the norm-independent, 
fast-approximate inverse computation, we appeal to the result of Spielman and Teng |37| (also see 
improvements by Koutis, Miller and Peng |[2T1l ). The theorem we prove is the following. 

Theorem 1.2 (SDD Matrix Exponential Computation) Given an n x n SDD matrix A, a vector v 
and a parameter ^ < 1, there is an algorithm that can compute a vector u such that \\exp{—A)v — m|| < 
S\\v\\ in time 0((m^ + n) log(2 + || A||)). Here the tilde hides poly (log n) and poly(logV'5) /actors. 

First, we note that for our application, the dependence of the running time on the log (2 + ||A||) 
turns out to just contribute an extra logn factor. Also, for our application S = Vpoiy()3). Secondly, 
for our BS application, the matrix we need to invert is not SDD or sparse. Fortunately, we can 
combine Spielman-Teng solver with the Sherman-Morrison formula to invert our matrices; see 
Theorem 13.21 A significant effort goes into analyzing the effect of the error introduced due to 
approximate matrix inversion. This error can cascade due to the iterative nature of our algorithm 
that proves this theorem. 

Towards proving the hypothesis above, when the only guarantee we know on the matrix is 
that it is symmetric and PSD, we prove the following theorem, which is the best known algorithm 
to compute exp{—A)v for an arbitrary symmetric PSD matrix A, when || A|| = a;(poly(logn)). 
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Theorem 1.3 (PSD Matrix Exponential Computation) Given an n x n symmetric PSD matrix A, a 
vector V and a parameter 5 <1, there is an algorithm that computes a vector u such that ||exp(— A)u — w || < 



S\\v\\ in timeO y{t^ + + \\A\\ log(2 + ||^||)j . Here the tilde hides Tpoly {log n) and poly {\og^/s) 

factors. 

In the symmetric PSD setting we also prove the following theorem which, for our application, 
gives a result comparable to Theorem ll.3[ 

Theorem 1.4 (Simple PSD Matrix Exponential Computation) Given an n x n symmetric PSD ma- 
trix A, a vector v and a parameter 6 < 1, there is an algorithm that computes a vector u such that 

||exp(-A)y -u\\ < 6 \\v\\ , in time 0{{tA + n) ■ k + k^), where k = 0(v^l + ||A||). Here the tilde 
hides poly (log V<5) factors. 

As noted above, t^ can be significantly smaller than m^ . Moreover, it only uses multiplication of 
a vector with the matrix A as a primitive and does not require matrix inversion. Consequently, 
it does not need tools like the SDD solver or conjugate gradient, thus obviating the error analy- 
sis required for the previous algorithms. Furthermore, this algorithm is very simple and when 
combined with our random walk-based BalSep algorithm, results in a very simple and practical 
0(1/7) approximation algorithm for BS that runs in time 0('"/ ^7). 

Theorem 11.41 relies on the Lanczos method which can be used to convert guarantees about 
polynomial approximation from scalars to matrices. In particular, it uses the following structural 
result (the upper bound) on the best degree k polynomial (^-uniformly approximating e^^ in an 
interval [a, b] . We also prove a lower bound which establishes that the degree cannot be improved 
beyond lower order terms. This suggests that improving on the 0('"/ running time in Theorem 
[L4] requires more advanced techniques. To the best of our knowledge, this theorem is new and is 
of independent interest in approximation theory. 

Theorem 1.5 (Uniform Approximation to e^) 

• Upper Bound. For every Q < a < h, and < < 1, there exists a polynomial p that satisfies, 
sup^,g[^^] \e^^ — p{x)\ < 5 ■ e^", and has degree O ^y^maxjlog^ V<5, {b — a) ■ log lA} • (logi^)^^ 

• Lower Bound. For every < a < b such that a + logg4 < b, and 5 G (0, i/s], any polynomial 
p{x) that approximates e^^ uniformly over the interval [a, b] up to an error of 5 ■ e^", must have 
degree at least \ ■ \/b — a . 

2 Organization of the Main Body of the Paper 

In Section |3] we present a technical overview of our results and in Section H] we discuss the open 
problems arising from our work. The main body of the paper follows after it and is divided into 
three sections, each of which have been written so that they can be read independently. Section |5] 
contains a complete description and all the proofs related to Theorem I 1.11 and Theorem |3.1[ Section 
|6] contains our results on computing the matrix exponential; in particular the proofs of Theorems 
I1.2lll.4l and l3.2[ Section[7|contains the proof of our structural results on approximating e^^ and the 
proof of Theorem ll.51 
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3 Technical Overview of Our Results 



3.1 Our Spectral Algorithm for Balanced Separator 

In this section, we provide an overview of Theorem ll.il As pointed out in the introduction, our 
algorithm, BalSep, when combined with the matrix-exponential-vector algorithm in Theorem ll.4l 
results in a very simple and practical algorithm for BS. We record the theorem here for complete- 
ness and then move on to the overview of BalSep and its proof. The proof of this theorem appears 
in Section [sill 

Theorem 3.1 (Simple Spectral Algorithm for Balanced Separator) Given an unweighted graph G = 
(y, £), a balance parameter b G (0,1/2], b = 0(1) and a conductance value 7 G (0,1), we give 
an algorithm, which runs in time 0{"'/y^), that either outputs an Cl{b) -balanced cut S C V such that 
(p{S) < 0(1/7) or outputs a certificate that no b-balanced cut of conductance 7 exists. 

3.1.1 Comparison with the RLE Algorithm 

Before we explain our algorithm, it is useful to review the RLE algorithm. Recall that given G, 7 
and b, the goal of the BS problem is to either certify that every b-balanced cut in G has conductance 
at least 7, or produce a n(b) balance cut in G of conductance O(y^). RLE does this by applying 
LE iteratively to remove unbalanced cuts of conductance 0(1/7) from G. The iterations stop and 
the algorithm outputs a cut, when it either finds a{b/2) -balanced cut of conductance O ( 1/7 ) or the 
union of all unbalanced cuts found so far is (b/2)-balanced. Otherwise, the algorithm terminates 
when the residual graph has spectral gap at least 27. In the latter case, any fc-balanced cut must 
have at least half of its volume lie within the final residual graph, and hence, has conductance 
at least 7 in the original graph. Unfortunately, this algorithm may require Cl{n) iterations in 
the worst case. For instance, this is true if the graph G consists of Q(n) components loosely 
connected to an expander-like core through cuts of low conductance. This example highlights the 
weakness of the RLE approach: the second eigenvector of the Laplacian may only be correlated 
with one low-conductance cut and fail to capture at all even cuts of slightly larger conductance. 
This limitation makes it impossible for RLE to make significant progress at any iteration. We now 
proceed to show how to fix RLE and present our algorithm at a high level. 

3.1.2 High-Level Idea of Our Algorithm 

Rather than working with the vertex embedding given by the eigenvector, at iteration t, we will 
consider the multi-dimensional vector embedding represented by the transition probability ma- 
trix P^^^ of a certain random walk over the graph. We refer to this kind of walk as an Accelerated 
Heat Kernel Walk (AHK) and we describe it formally in Section l3.1.4[ At each iteration t = 1,2, . . . , 
the current AHK walk is simulated for t = log"/7 time to obtain P^^K For any t, this choice of 
T ensures that the walk must mix across all cuts of conductance much larger than 7, hence em- 
phasizing cuts of the desired conductance in the embedding P^^\ The embedding obtained in this 
way, can be seen as a weighted combination of multiple eigenvectors, with eigenvectors of low 
eigenvalue contributing more weight. Hence, the resulting embedding captures not only the cut 
corresponding to the second eigenvector, but also cuts associated with other eigenvectors of eigen- 
value close to 7. This enables our algorithm to potentially find many different low-conductance 
unbalanced cuts at once. Moreover, the random walk matrix is more stable than the eigenvector 
under small perturbations of the graph, making it possible to precisely quantify our progress from 
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one iteration to the next as a function of the mixing of the current random walk. For technical rea- 
sons, we are unable to show that we make sufficient progress if we just remove the unbalanced 
cuts found, as in RLE. Instead, if we find a low-conductance unbalanced cut S^'^ at iteration t, 
we perform a soft removal, by modifying the current walk P^^' to accelerate the convergence to 
stationarity on the set S^'^\ This ensures that a different cut is found using p('+^) in the next iter- 
ation. In particular, the AHK walks we consider throughout the execution of the algorithm will 
behave like the standard heat kernel on most of the graph, except on a small unbalanced subset 
of vertices, where their convergence will be accelerated. We now present our algorithm in more 
detail. We first recall some definitions. 

3.1.3 Definitions 

G = {V,E) is the unweighted instance graph, where V = [n] and |£| = m. We let d G R" be 
the degree vector of G, i.e., dj is the degree of vertex i. For a subset S Q V, we define the edge 

volume as vol(S) = Y^i^s '^i- The total volume of G is 2m. We denote by Ky the complete graph 
with weight <ii<ij/2m between every pair i,] G V. For / G V, Sj is the star graph rooted at i, with 
edge weight of <^i<^j/im between i and for all 7 G V. For an undirected graph H = [V,Eh), let 
A{H) denote the adjacency matrix of H, and D{H) the diagonal matrix of degrees of H. The 
(combinatorial) Laplacian of H is defined as L[H) = D{H) — A[H). By D and L, we denote 
D(G) and L{G) respectively for the input graph G. For two matrices A, B of equal dimensions, let 
A • B = Tr(ATB) = Ay • By. denotes the all Os vector. 

3.1.4 The AHK Random Walk and its Mixing 

We will be interested in continuous-time random walk processes over V that take into account 
the edge structure of G. The simplest such process is the heat kernel process, which has already 
found many applications in graph partitioning, particularly in the work of Chung lfT4l , and in 
many of the combinatorial algorithms for solving graph partitioning SDPs 12311 . The heat kernel is 
defined as the continuous-time process having transition rate matrix — LD^^ and, hence, at time T 
the probability- transition matrix becomes exp(— tLD^^). The AHK random walk process has an 
additional parameter j5, which is a non-negative vector in R" . The transition rate matrix is then de- 
fined tobe — (L -|- Y^ifzy j6;£^(S;))D^^. The effect of adding the star terms to the transition rate matrix 
is that of accelerating the convergence of the process to stationarity at vertices i with large value of 
j6;, since a large fraction of the probability mass that leaves these vertices is distributed uniformly 
over the edges. We denote by PT(j6) the probability-transition exp(— t(L -|- YIigv f^i^{Si))D^^). As 
is useful in the study of spectral properties of non-regular graphs, we will study D^^Pt-(/3). This 
matrix describes the probability distribution over the edges of G and has the advantage of being 
symmetric and PSD: D-^Pr{^) = D^'/^ exp(-TD-'/2(L + E^-^y ^;L(S/))D-'/2)D-'/Mn particular, 
D^^Px (j6) can be seen as the Gram matrix of the embedding given by the columns of its square 
root, which in this case is just D^'^^Pt/2(/3). This property will enable us to use geometric SDP- 
rounding techniques to analyze AHK walks. Throughout the algorithm, we keep track of the 
mixing of P*^'' by considering the total deviation of P^'' from stationarity, i.e., the sum over all ver- 
tices i E V of the £|-distance from the stationary distribution of P^'^e,. Here, e,- denotes the vector 
in R" which is 1 at the z"^ coordinate and elsewhere. We denote the contribution of vertex / to 
this distance by Y(Pt(/3), f). Similarly, the total deviation from stationarity over a subset S Q V,is 

given by, Y(Pt-((6), S) = Zies (^'t((6), /). Y(Pt-(^), V) will play the role of potential function in our 
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algorithm. Moreover, it follows from the definition of Y that Y(PT-(j6), ^) = L{Kv) • D-'^Pj{^). Fi- 
nally, to connect to the high-level idea described earlier, for each iteration t, we use P^*) = Pt(/3(*)) 
for T = log '1/7 with ^ y'^TljZl T^i^sW ^^'^ starting with jf3(^' = 0. We now provide a slightly 
more detailed description of our algorithm from Theorem I 1.11 (called BalSep) and its analysis. For 
reference, BalSep appears in Figure [T] 

3.1.5 Our Algorithm and its Analysis 

The algorithm proceeds as follows: At iteration t, it checks if the total deviation of P^*\ i.e., 
Y(p(*), V), is sufficiently small (i.e., P^*^ is mixing). In this case, we can guarantee that no bal- 
anced cut of conductance less than 7 exists in G. In more formal language, it appears below. 

(A) (see Lemma\5M Let S = UJ^jS^'l For any t > 1, if Y(p(*),y) < Vpoiy(n), and vol(S) < 
Vioo • 2m, then L + ^/rj^i^g L{Si) >: 0(7) • L{Kv). Moreover, no b-balanced cut of conductance less than 
7 exists in G. 

This result has a simple explanation in terms of the AHK random walk P^^\ Notice that P^'^ is 
accelerated only on a small unbalanced set S. Hence, if a balanced cut of conductance less than 
7 existed, its convergence could not be greatly helped by the acceleration over S. Thus, if P*^'' is 
still mixing very well, no such balanced cut can exist. On the other hand, if P^^^ has high total 
deviation (i.e., the walk has not yet mixed), then, intuitively, some cut of low conductance exists 

in G. Formally, we show that, the embedding {j^f jjey has low quadratic form with respect to the 
Laplacian of G. 

(B) (see LemmaEB IfY{P^'l V) > then L • D-ip(') < 0(7)L(Xy) • D-ip(f). 

From an SDP-rounding perspective, this means that the embedding P^^' can be used to recover a 
cut S^^^ of conductance 0(1/7), using the SDP-rounding techniques from OV. If S^^-* or U^^^S^'^ is 
n(&) -balanced, then we output that cut and terminate. Otherwise, S^*^ is unbalanced. In this case, 
we accelerate the convergence from S^*^ in the current AHK walk by increasing (jS^^^), for every 
/ G S^') to give j6(*+^), and using /3(^+^' to produce p('+^' and move on to the next iteration. 

The analysis of our algorithm bounds the number of iterations by using the total deviation of 
P(') from stationarity as a potential function. Using the techniques of OV, it is possible to show 
that, whenever an unbalanced cut S^^^ is found, most of the deviation of P^'^ can be attributed to 
S^^\ In words, we can think of S^^^ as the main reason why p(') is not mixing. Formally, 

(C) (see Corollary At iteration t, ifY{P'^*\V) > Vpoiy(«) and S^^' is not ^/m-halanced, then 
w.h.p. Y(P('), S(*)) > 1/2 ■ Y(P(*), V). 

Moreover, we can show that accelerating the convergence of the walk from S^^) has the effect of 
removing from p('+i) a large fraction of the deviation due to S^'-* . The proof is a simple application 
of the Golden-Thompson inequality 121 and mirrors the main step in the MMWU analysis. Hence, 
we can show the total deviation of p(*+^) is just a constant fraction of that of P^^\ 

(D) (see Theorem\5l0i If "¥{P^*\ V) > Vpoiy(") and S^*) in not Vioo balanced, then w.h.p 

Y(p{'+i), V) < Y(P('), V) - 1/3 ■ Y(P(^), S(^)) < 5/6 ■ Y(P(^), V). 

This potential reduction at every iteration allows us to argue that after T = O(logn) iterations, 
p(^+i) must have a small deviation from stationarity and yields a certificate that no balanced cut of 
conductance less than 7 exists in G. Finally, to ensure that each iteration requires only 0(m) time. 
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we use the Johnson-Lindenstrauss Lemma to compute a O ( log n) -dimensional approximation to 
the embedding P^^\ To compute this approximation, we rely on the results on approximating the 
matrix exponential discussed in Section |3^ 

3.1.6 Exponential Embeddings of Graph and Proof Ideas 

Now, we illustrate the usefulness of the exponential embedding obtained from the AHK random 
walk, which is key to the proofs for (A) and (B) above. We suppress some details pertinent to our 
algorithm. Consider the AHK walk in the first iteration, with fi^^^ = 0. Letting C = D^^^^LD^^^^, 
p p(i) = exp(-TC). For the proof of (A), it follows that if Y(P, V) < Vpoiy(n), then by definition 
L{Kv) • D-^P = Tr(exp(-TC)) - 1 < Vpoiy(n). Hence, A2(C) > log'Vr = 7, by the choice of t. 
This lower bound on the second eigenvalue certifies that G has no cut of conductance at most 7. 
For our algorithm, at iteration t, when ^ 0, we will ensure that the Laplacians of the stars 
have small enough weight (~ Vt) and small support (vol(U-^^S^'-') < Vioo • 2m) for the argument 
above to still yield a lower bound of 7 on the conductance of any ^-balanced cut. 

For (B), observe that L • D^^P = C • exp(— tC) = Yli ^iC^^^', where A,, for i = 1, . . . ,n, are the 
eigenvalues of C. For eigenvalues larger than 27, e^'^'^' is bounded by Vpoiy(n) for t = logn/^. Since 
eigenvalues of a normalized Laplacian are 0(1), the contribution to the sum above by eigenvalues 
A, > 27 is at most i/poiy()3) overall. This can be shown to be a small fraction of the total sum. 
Hence, the quantity L • D^^P is mostly determined by the eigenvalues of value less than 27 and 
we have L • D^^P < 0(7) ■ L{Kv) • D^^P. The same analysis goes through when t > 1. 

3.2 Our Algorithms to Compute an Approximation to exp(— A)z; 

In this section, we give an overview of the algorithms in Theorem [L2] and Theorem [L4] and their 
proofs. The algorithm for Theorem ll.3l is very similar to the one for Theorem [L2] and we give the 
details in Section [6.3.2[ A few quick definitions: A matrix M is called Upper Hessenberg if, (M)y = 
for z > + 1. M is called tridiagonal if My = for / > / + 1 and for j > i + 1. Let Ai (M) and \n (M) 
denote the largest and smallest eigenvalues of M respectively. 

As we mention in the introduction, the matrices that we need to exponentiate for the BS algo- 
rithm are no longer sparse or SDD. Thus, Theorem 11.21 is insufficient for our application. Fortu- 
nately, the following theorem suffices and its proof is not very different from that of Theorem ll.2[ 
which is explained below. Its proof appears in Section l6!4l 

Theorem 3.2 (Matrix Exponential Computation Beyond SDD) Given a vector v, a parameter 5 <1 
and an n X n symmetric matrix A — YlHMHYl where M is SDD, H is a diagonal matrix with strictly 

positive entries and H is a rank [n — 1) projection matrix, H = 7 — iviv^ (w is explicitly known and 
\\w\\ = 1), there is an algorithm that computes a vector u such that ||exp(— — m|| < 5 \\v\\ in time 
0((mM + n) log(2 + ||HMH||)). The tilde hides poly {log n) andpoly{\ogys) factors. 

Recall from Section l3. 1 .41 that our algorithm for BS requires us to compute exp{—A)v for a matrix 
A of the form D-'/^(L + I^,- j6,L(S,))D-'/^ where ,6, > 0. We first note that if we let U = I- 
i/2m • {D'/n){D'/ny, the projection onto the space orthogonal to i/^Im ■ D^^^l, then, for each /, 
D-'/^L{Si)D-'/^ = n('i//2m ■ ; + eiej)n. Since D'^n is an eigenvector of D-''^LD-'^\ we have, 
UD-^'^LD-^'m = D-''^LD-''\ Thus, 

A = IlD-''^LD-''^U + Y,^in{d,/im-l + e,eJ)U = UD-'^^L + J^^idi/2m-D + J^^idi-eieJ)D-'^^n. 

i i i 
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This is of the form YlHMHYl, where H = is diagonal and M is SDD. It is worth noting that 

since A itself may be neither sparse nor SDD, we cannot apply the Spielman-Teng SDD solver to 
approximate {I + olA)^^. The proof of the above theorem uses the Sherman-Morrison formula to 
extend the SDD solver to fit our requirement. Moreover, to obtain a version of Theorem 11.41 for 
such matrices, we do not have to do anything additional since multiplication by H and 11 take 
0(n) steps and hence, is still 0{mM + n). The details appear in Section [Ol Finally, note that in 
our application, ||HMH|| is poly(n). 

We now give an overview of the proofs of Theorem [L2] and Theorem ll.4[ First, we explain a 
general method known as the Lanczos method, which is pervasive in numerical linear algebra. 
We then show how suitable adaptations of this can be combined with (old and new) structural 
results in approximation theory to obtain our results. 

3.2.1 Lanczos Method 

Given an n x n symmetric PSD matrix B and a function / : R i— > R, we can define /(B) as follows: 

Let Ui,...,Un be eigenvectors of B with eigenvalues Ai, . . . , A„. Define /(B) = YLi f{^i)^'-i^i-J ■ We 
will reduce both our algorithms to computing f{B)v for a given vector v, albeit with different f's 
and B's. We point out the f's and B's required for Theorems II .21 and II .41 in Sections I3.2.2l and l3.2.3l 
respectively. 

Since exact computation of /(B) usually requires diagonalization of B, which could take as 
much as 0{n^) time (see II26II ), we seek an approximation to f{B)v. The Lanczos method al- 
lows us to do exactly that: It looks for an approximation to f{B)v of the form p{B)v, where 
p is a polynomial of small degree, say k. Before we describe how, we note that it computes 
this approximation in roughly 0{{tB + n)k) time plus the time it takes to compute /(•) on a 
(A; + 1) X (A; + 1) tridiagonal matrix, which can often be upper bounded by 0{k^) (see Ii26l ). 
Hence, the time is reduced to 0{{tB + n)k + k^). What one has lost in this process is accuracy: 
The candidate vector ii output by the Lanczos method, is now only an approximation to f{B)v. 
The quality of approximation, or ||/(B)r; — m||, can be upper bounded by the uniform error of the 
best degree k polynomial approximating / in the interval [A„ (B), Ai (B)] . Roughly, |j/(B)u — m || 
(minp,,gj;j, sup^.gj^^jg-) a„(b)1 ~ Pk{^)\)- Here Ejt is the collection of all real polynomials of de- 
gree at most k. Surprisingly, one does not need to know the best polynomial and proving existence 
of good polynomials is sufficient. By increasing k, one can reduce this error and, indeed, if one lets 
k = n, there is no error. Thus, the task is reduced to proving existence of low degree polynomials 
that approximate / within the error tolerable for the applications. 

Computing the Best Polynomial Approximation. 

Now, we describe in detail, the Lanczos method and how it achieves the error guarantee claimed 
above. Notice that for any polynomial p of degree at most k, the vector p{B)v lies in AT = 
Span{u, Bv, . . . , B^v} - called the Krylov subspace. The Lanczos method iteratively creates an or- 
thonormal basis {fi}f^o ^^'^^ that ^ i < k, Span{vo, . . . ,Vi} = Span{v, . . . ,B'v}. Let V)t 

be the n x (fc + 1) matrix with {y,}^^Q as its columns. Thus, Vj.VjT denotes the projection onto 
the Krylov subspace. We let Tjt be the (A; + 1) x (A + 1) matrix expressing B as an operator re- 
stricted to /C in the basis {y,}^^Q, i.e., Tj^ = Vj^BV^. Note that this is not just a change of basis, 
since vectors in AT can be mapped by B to vectors outside )C. Now, since v, Bv G AT, we must have 
Bv = {VkV^)B{VkVjl)v = Vk{V^ BVk)Vjl V = VkT^V^v. Iterating this argument, we get that for 
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all i < k, B'v = Vi^TlVjJv, and hence, by linearity, p{B)v = Vi(p{T]()VjJv, for any polynomial p of 
degree at most k. 

Now, a natural approximation for f{B)v is V]^f{Ti()Vj^ v. Writing ri^{x) = f{x) — Pk{x), where 
is any degree k approximation to f{x), the error in the approximation is f{B)v — Vkf{Tk)Vj^v = 
rk{B)v — Vkrk{Tk)ViJv, for any choice of pk.iience,thenormof the error vector is atmost (||rjt(B)|| + 
Ik/cl^Ar) II ) ||f II , which is bounded by the value of on the eigenvalues of B (eigenvalues of 
are a subset of eigenvalues of B). More precisely, the norm of the error is bounded by 2 ||y|| • 
max;^ggpggt^^j^(g-) |/(A) — P)t('^)|- Minimizing over p^ gives the error bound claimed above. Note 
that we do not explicitly need the approximating polynomial. It suffices to prove that there exists 
a degree k polynomial that uniformly approximates / well on an interval containing the spectrum 
of B and T^. 

If we construct the basis iteratively as above, Bvj G Span{z;o/ ■ ■ ■ / by construction, and if 
i > i + 1, Vi is orthogonal to this subspace and hence vj (Bvj) = 0. Thus, is Upper Hessenberg. 
Moreover, if B is symmetric, vJ (BVj) = vJ [BVj), and hence is symmetric and tridiagonal. This 
means that while constructing the basis, at step i + 1, it needs to orthonormalize Bvj only w.r.t. 
and Vj. Thus the total time required is 0((^B + n)fc), plus the time required for the computation 
of f{Tk), which can typically be bounded by 0{k^) for a tridiagonal matrix (using Il26l ). This 
completes an overview of the Lanczos method. The LanCZOS procedure described in Figure|4]in 
the main body, implements the Lanczos method. We now move on to describing how we apply it 
to obtain our two algorithms. 

3.2.2 Approximating exp{—A)v Using a Rational Approximation to 

Our Algorithm. 

The starting point of the algorithm that underlies Theorem 11.21 is a rather surprising result by 
Saff, Schonhage and Varga (SSV) Il30ll . which says that for any integer k, there exists a degree k 
polynomial p^ such that, p^((l + V^)^^) approximates up to an error of 0{k ■ 2^^) over the 
interval [0, oo) (Theorem 16. 8[ Corollary I6.9|l . Then, to compute exp(— A)y, one could apply the 
Lanczos method with B = (7 + ^//c)^^ and f{x) = e'^(i-Vx)_ Essentially, this was the method 
suggested by Eshof and Hochbruck |13J. The strong approximation guarantee of the SSV result 
along with the guarantee of the Lanczos method from the previous section, would imply that the 
order of the Krylov subspace for B required would be roughly log V <5, and hence, independent of 
II A||. The running time is then dominated by the computation Bv = {I + '^/k)^^v. 

EH note that the computation of exact matrix inverse is a costly operation (O(n^) time in gen- 
eral) and all known faster methods for inverse computation incur some error. They suggest using 
the Lanczos method with faster iterative methods, e.g. Conjugate Gradient, for computing the 
inverse (or rather the product of the inverse with a given vector) as a heuristic. They make no at- 
tempt to give a theoretical justification of why approximate computation suffices. Also note that, 
even if the computation was error-free, a method such as Conjugate Gradient will have running 
time which varies with ^J / in general. Thus, the EH method falls substantially short of 
resolving the hypothesis mentioned in the introduction. 

To be able to prove Theorem 11.21 using the SSV guarantee, we have to adapt the Lanczos 
method in several ways, and hence, deviate from the method suggested by EH: 1) EH construct 
Tk as a tridiagonal matrix as Lanczos method suggests, but since the computation is no longer 
exact, the basis {y,}^^Q is no longer guaranteed to be orthonormal. As a result, the proofs of the 
Lanczos method break down. Our algorithm, instead, builds an orthonormal basis, which means 
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that T)t becomes an Upper Hessenberg matrix instead of tridiagonal and we need to compute 
dot products in order to compute Tj^. 2) With T]^ being asymmetric, several nice spectral properties 
are lost, e.g. real eigenvalues and an orthogonal set of eigenvectors. We overcome this fact by 

symmetrizing T/^ to construct T/^ = 2 and computing our approximation with T^. This per- 
mits us to bound the quality of a polynomial approximation applied to Tjt by the behavior of the 
polynomial on the eigenvalues of Tj^. 3) Our analysis is based on the SSV approximation result, 
which is better than the variant proved and used by EH. Moreover, for their shifting technique, 
which is the source of the ||exp(— A) || factor in the hypothesis, the given proof in EH is incorrect 
and it is not clear if the given bound could be achieved even under exact computation"^. 4) Most 
importantly, since A is SDD, we are able to employ the Spielman-Teng solver (Theorem 16. 10|) to 
approximate {I + ^/k)^^v. This procedure, called ExpRational, has been described in Figure|5] 
in the main body. 

Error Analysis. 

To complete the proof of Theorem ll.2[ we need to analyze the role of the error that creeps in 
due to approximate matrix inversion. The problem is that this error, generated in each iteration 
of the Krylov basis computation, propagates to the later steps. Thus, small errors in the inverse 
computation may lead to the basis 14 computed by our algorithm to be quite far from the A:-th 
order Krylov basis for B, v. We first show that, assuming the error in computing the inverse is 
small, Tf: can be used to approximate degree k polynomials of B = {I + ^/k)^^ when restricted to 

the Krylov subspace, /.e. \\p{B)v — V](p{T]()Vi^v\\ ^ . Here, if p = ^f^o ■ IIpIIi = T!i>o\'^'\- 
This is the most technical part of the error analysis and unfortunately, the only way we know of 
proving the error bound above is by tour deforce. A part of this proof is to show that the spectrum 
of Tjt cannot shift far from the spectrum of B. 

To bound the error in the candidate vector output by the algorithm, i.e. \\f{B)v — Vf:f{T]()Vi^v\\, 
we start by expressing e^^ as the sum of a degree /c-polynomial in (1 + ^/k)^^ and a remainder 
function r^. We use the analysis from the previous paragraph to upper bound the error in the poly- 
nomial part by ~ 1 1 p 1 1 j • We bound the contribution of the remainder term to the error by bounding 
||rjt(B)|| and ||?'jt(r)t)|j. This step uses the fact that eigenvalues of ri{Ti) are where {A,},- 

are eigenvalues of T^. This is the reason our algorithm symmetrizes T}^ to T]^. To complete the error 
analysis, we use the polynomials p\ from SSV and bound || || ^ • Even though we do not know p\ 
explicitly, we can bound its coefficients indirectly by writing it as an interpolation polynomial. All 
these issues make the error analysis highly technical. However, since the error analysis is crucial 
for our algorithms, a more illuminating proof is highly desirable. 

3.2,3 Approximation Using Our Polynomial Approximation to 

More straightforwardly, combining the Lanczos method with the setting B = A and f{x) = e^^ 
along with the polynomial approximation to e^^ that we prove in Theorem ll.Si we get that setting 
k K, y^Ai(A) — A„(A) ■ poly(log V^) suffices to obtain a vector u that satisfies ||exp(— A)u — m|| < 
(5 II u II ||exp(— A) II . This gives us our second method for approximating exp(— A)?;. Note that this 

*EH show the existence of degree k polynomials in (1 + vx)^^ for any constant v 6 (0, 1), that approximate e^^ up to 
an error of exp(i/2t' — @{\/k[v^^ — 1))). In order to deduce the claimed hypothesis, it needs to be used for v w Va„(a), 
in which case, there is a factor of e'^n(^) in the error, which could be huge. 
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algorithm avoids any inverse computation and, as a result, the procedure and the proofs are sim- 
pler and the algorithm practical. 

3.3 Our Uniform Approximation for 

In this section, we give a brief overview of the proof of Theorem ll.51 The details appear in Section 
[7] and can be read independently of the rest of the paper. 

A straightforward approach to approximate e^^ over \a, h] is by truncating its series expansion 
around ^y^. With a degree of the order of (b — a) + log i/<5, these polynomials achieve an error of 
5 ■ e^'^+'^'Z^^ fQj- gj-^y constant 5 > d. This approach is equivalent to approximating over [—1, 1], 
for A = i^-")/!, by polynomials of degree 0(A + log V"?). On the flip side, it is known that if A 
is constant, the above result is optimal (see e.g. I|3T]| ). Instead of polynomials, one could consider 
approximations by rational functions, as in lfT2ll39ll . However, the author in 131)1 shows that, if both 
A and the degree of the denominator of the rational function are constant, the required degree of 
the numerator is only an additive constant better than that for the polynomials. It might seem that 
the question of approximating the exponential has been settled and one cannot do much better. 
However, the result by SSV mentioned before, seems surprising in this light. The lower bound 
does not apply to their result, since the denominator of their rational function is unbounded. In a 
similar vein, we ask the following question: If we are looking for weaker error bounds, e.g. 5 ■ e^" 
instead of 3 ■ e-<"+"'/2 (recall b > a), can we improve on the degree bound of 0{{b — a) + log ^/s)? 
Theorem I 1 . 5 1 answers this question in the affirmative and gives a new upper bound and an almost 
matching lower bound. We give an overview of the proofs of both these results next. 

Upper Bound. 

We wish to show that there exists a polynomial of degree of the order of \/b — a ■ poly (log i/i ) that 
approximates e^^ on the interval [a, b], up to an error of 3 ■ e^" for any ^ > 0. Our approach is to 
approximate (1 + ^/k)^^ on the interval [a, b], by a polynomial q of degree /, and then compose the 
polynomial p^ from the SSV result with cj, to obtain p^{c]{x)) which is a polynomial of degree k ■ I 
approximating e^^ over [a,b]. Thus, we are looking for polynomials q that minimize \q{x) — i/x| 
over [1 + "/A:, 1 + b/k]. Slightly modifying the optimization, we consider polynomials q that mini- 
mize \x ■ q{x) — 1| over [1 + «/^, 1 + V^]- In Section [71 we show that the solution to this modified 
optimization can be derived from the well-known Chebyshev polynomials. For the right choice 
of k and /, the composition of the two polynomials approximates e^^ to within an error of 3 ■ e^" 
over \a,b], and has degree yjb — a ■ poly (log . To bound the error in the composition step, we 
need to bound the sum of absolute values of coefficients of p^, which we achieve by rewriting p^ 
as an interpolation polynomial. The details appear in Section [71 

Loiver Bound. 

As already noted, since we consider a weaker error bound 3 ■ e^" and A = Q'-")/! isn't a constant 
for our requirements, the lower boimds mentioned above no longer hold. Nevertheless, we prove 
that the square-root dependence on — a of the required degree is optimal. The proof is simple 
and we give the details here: Using a theorem of Markov from approximation theory (see 1101 ), 
we show that, any polynomial approximating e^^ over the interval [a, b] up to an error of 3 ■ e^'^ , 
for some constant 3 small enough, must have degree of the order of \/b — a. Markov's theorem 
says that the absolute value of the derivative of a univariate polynomial p of degree k, which lives 
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in a box of height h over an interval of width w, is upper bounded by '*^''/ lu. Let p/^ be a polynomial 
of degree k that 3 ■ e^"-approximates e^^ in the interval [a,b]. If b is large enough and, 3 a small 
enough constant, then one can get a lower bound of 0(6^") on the derivative of using the Mean 
Value Theorem. Also, one can obtain an upper bound of 0{e^'^) on the height of the box in which 
lives. Both these bounds use the fact that p;^ approximates e^^ and is 3 ■ e^" close to it. Since 
the width of the box is & — a, these two facts, along with Markov's theorem, immediately imply 
a lower bound of Ci{\^h — a) on k. This shows that our upper bound is tight up to a factor of 
poly(log V<5). 

4 Discussion and Open Problems 

In this paper, using techniques from disparate areas such as random walks, SDPs, numerical linear 
algebra and approximation theory, we have settled the question of designing an asymptotically 
optimal 0{m) spectral algorithm for BS (Theorem ll.l|) and alongwith provided a simple and prac- 
tical algorithm (Theorem 13. 1|) . However, there are several outstanding problems that emerge from 
our work. 

The main remaining open question regarding the design of spectral algorithms for BS is whether 
it is possible to obtain stronger certificates that no sparse balanced cuts exist, in nearly-linear time. 
This question is of practical importance in the construction of decompositions of the graph into 
induced graphs that are near-expanders, in nearly-linear time Il38ll . OV show that their certifi- 
cate, which is of the same form as that of BalSep, is stronger than the certificate of Spielman and 
Teng |38|. In particular, our certificate can be used to produce decompositions into components 
that are guaranteed to be subsets of induced expanders in G. However, this form of certificate is 
still much weaker than that given by RLE, which actually outputs an induced expander of large 
volume. 

With regards to approximating the Matrix exponential, a computation which plays an impor- 
tant role in SDP-based algorithms, random walks, numerical linear algebra and quantum comput- 
ing, settling the hypothesis remains the main open question. Further, as noted earlier, the error 
analysis plays a crucial role in making Theorem [L2] and, hence, Theorem I 1 . 1 1 work , but its proof is 
rather long and difficult. A more illuminating proof of this would be highly desirable. 

Another question is to close the gap between the upper and lower bounds on polynomial 
approximations to e^^ over an interval [a, b] in Theorem ll.51 

5 The Algorithm for Balanced Separator 

In this section we provide our spectral algorithm B ALSep and prove Theorem ll.il We also mention 
how Theorem 13.11 follows easily from the proof of Theorem 11.11 and Theorem 1 1.41 We first present 
the preliminaries for this section. 

5.1 Basic Preliminaries 

Instance Graph and Edge Volume. 

We denote by G = {V,E) the unweighted instance graph, where V = [n] and \ E\ = m. We assume 
G is connected. We let d G R" be the degree vector of G, i.e. di is the degree of vertex i. For a 
subset S Q V, we define the edge volume as vol(S) = Ylies ^i- The total volume of G is 2m. The 
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conductance of a cut (S, S) is defined to be (p{S) = |E(s,s)|/niin{voi(s),voi(s)}, where vol (S) is the sum of 
the degrees of the vertices in the set S. Moreover, a cut (S, S) is fc-balanced if min{vol(S), vol(S)} > 
b-vol(y). 

Special Graphs 

We denote by Ky the complete graph with weight i^ii^j/im between every pair i,] G V. For / G V , Si 
is the star graph rooted at i, with edge weight of '^I'^j/im between / and /, for all / G V. 

Graph matrices. 

For an undirected graph H = {V,Eh), let A{H) denote the adjacency matrix of H, and D{H) 
the diagonal matrix of degrees of H. The (combinatorial) Laplacian of H is defined as L(H) = 
D(H) - A{H). Note that for all x G R^, x'^L{H)x = L{y}e£„(x, - xj)^. By D and L, we denote 
D(G) and L(G) respectively for the input graph G. Finally, the natural random walk over G has 
transition matrix W = AD^^. 

Vector and Matrix Notation. 

We are working within the vector space IR". We will denote by I the identity matrix over this 
space. For a symmetric matrix A, we will use A >z to indicate that A is positive semi-definite. 
The expression A ^ B is equivalent to A — B 0. For two matrices A, B of equal dimensions, 
let A • B = Tr(ATB) = Zij Aij ■ Bjj. We denote by {e,}"^! the standard basis for R". and 1 will 
denote the all Os and all Is vectors respectively. 

Fact 5.1 L{Kv) = D- V2m ■ Dll^D = D'/^{I - Vim ■ D''^VID''^)D''\ 
Embedding Notation. 

We will deal with vector embeddings of G, where each vertex z G ^ is mapped to a vector vi G R'', 
for some d < n. For such an embedding {c,},ey, we denote by Uavg the mean vector, i.e. Cavg = 
X^,£y^i/2m ■ Vi. Given a vector embedding {vi G R^'ljgy, recall that X is the Gram matrix of the 
embedding if Xy = vjvj. A Gram matrix X is always PSD, i.e., X >: 0. For any X G R"^", X >: 0, 
we call {vi}i^v the embedding corresponding to X if X is the Gram matrix of {vi}i^v For z G V, we 

1 1 2 

denote by the matrix such that R/ • X = \\Vi — t^avg || • 

9 II 1 1 2 

Fact 5.2 EiGV^iRi • X = l^i^^ydi \\Vi - UavgH = V2m • EiKj^jdi \\Vi - Vj\\ = L(Ky) • X. 

5.2 AHK Random Walks 

The random-walk processes used by our algorithm are continuous-time Markov processes II27II 
over V. In these processes, state transitions do not take place at specified discrete intervals, but 
follow exponential distributions described by a transition rate matrix Q G R"^", where Qy speci- 
fies the rate of transition from vertex j to i. More formally, letting p(t) G R" be the probability 
distribution of the process at time t > 0, we have that ^Pi'^)/dT = Qp{t) Given a transition rate 
matrix ^ Q, the differential equation for p{t) implies that p(t) = e^Qp(O). In this paper, we will be 

matrix Q is a valid transition rate matrix if its diagonal entries are non-positive and its off-diagonal entries are 
non-negative. Moreover, it must be that IQ = 0, to ensure that probability mass is conserved. 
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interested in a class of continuous-time Markov processes over V that take into account the edge 
structure of G. The simplest such process is the heat kernel process, which is defined as having 
transition rate matrix Q = —{I — W) = — LD^^. The heat kernel can also be interpreted as the 
probability transition matrix of the following discrete-time random walk: sample a number of 
steps i from a Poisson distribution with mean t and perform / steps of the natural random walk 
over G : 

P(t) = e-^^^ p(0) = e-^(^-^V(0) = e"^E-T^'P(°)- 

;=0 

For the construction of our algorithm, we generalize the concept of heat kernel to a larger class 
of continuous-time Markov processes, which we name Accelerated Heat Kernel (AHK) processes. A 
process in this class is defined by a non-negative vector /5 G K" and the transition rate matrix 
of 'H()S) is Q(j6) = — (L + /3,L(S,))D^^. As this is the negative of a sum of Laplacian matrices, 
it is easy to verify that it is a valid transition rate matrix. The effect of adding the star terms to the 
transition rate matrix is that of accelerating the convergence of the process to stationary at vertices 
i with large value of /3,, as a large fraction of the probability mass that leaves these vertices is 
distributed uniformly over the edges. We denote by PT(j6) the probability-transition matrix of 
U{f) between time and t, i.e. Pt(0) = e^6(^). 



Embedding View. 

A useful matrix to study 'H{^) will be D^^P2t(j6). This matrix describes the probability distribu- 
tion over the edges of G and has the advantage of being symmetric and positive semidefinite: 

Moreover, we have the following fact: 

Fact 5.3 D-'/^Pr{^) is a square root of D-^P2-,{^). 

Proof: 

■ 

Hence, D~^P2t(/3) is the Gram matrix of the embedding given by the columns of its square root 
D^^^^P^ifi). This property will enable us to use geometric SDP techniques to analyze 



Mixing. 

Spectral methods for finding low-conductance cuts are based on the idea that random walk pro- 
cesses mix slowly across sparse cuts, so that it is possible to detect such cuts by considering the 
starting vertices for which the probability distribution of the process strongly deviates from sta- 
tionary. We measure this deviation for vertex i at time t by the £|-norm of the distance between 
Pr{fi)ei and the uniform distribution over the edges ofG. We denote it by Y(PT-(j6), i) : 



Y(p,(/3),o=d,i:^i 
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A fundamental quantity for our algorithm will be the total deviation from stationarity over a 
subset S C y. We will denote W{Pt{^),S) = ZiGS^iPti^)^)- In particular, Y(P^(^), V) will play 
the role of potential function in our algorithm. The following facts express these mixing quantities 
in the geometric language of the embedding corresponding to D^^P2t(j6). 

Fact 5.4 Y(P^(/3),/) = . D-^P^.i^). 
Proof: By Fact l5.3l and the definition of R, : 



diRi • D"^P2t(j6) = di 



jGV 



di 



di 



D-'/'Pr{f^)e, 



2m 



D^/MD-ip,(/3)e, 



2m 



Y(Pr(/3),0- 



The following is a consequence of Fact 15.21 
Fact 5.5 Y(P^(/3), V) = EiGvd,R, • D-^P2r{^) = L{Kv) • D-iP2^(/3). 

5.3 Algorithm Description 

Prelhninaries 

All the random walks in our algorithm will be run for time r = 0(logn)/-y. We will consider embed- 
dings given by the columns of D^'^^Pt-(j6) for some choice of fi. Because we want our algorithm 
to run in time 0{m) and we are only interested in Euclidean distances between vectors in the 
embedding, we will use the Johnson-Lindenstrauss Lemma (see Lemma TS.lSl in Section l53)) to ob- 
tain an O ( log n) -dimensional embedding approximately preserving distances between columns 
of D^^^^PT-(j6) up to a factor of (1 -|- e), where £ is a constant such that i+'^/i-e < Vs. 

Our algorithm BalSep will call two subroutines FindCut and EXPV. FindCut is an SDP- 
rounding algorithm that uses random projections and radial sweeps to find a low-conductance 
cut, that is either c-balanced, for some constant c = d{b) < Vioo defined in OV, or obeys a strong 
guarantee stated in Theorem 15.81 Such algorithm is implicit in Il25ll and is described precisely in 
Section lS^l ExpV is a generic algorithm that approximately computes products of the form Pt-(j6)m 
for unit vectors 11. ExPV can be chosen to be either the algorithm implied by Thereom l3.2[ which 
makes use of the Spielman-Teng solver, or that in Theorem ll.4[ which just applies the Lanczos 
method. 

We are now ready to describe BalSep, which will output a c-balanced cut of conductance 
0(1/7) or the string NO, if it finds a certificate that no b-balanced cut of conductance less than 7 
exists. BalSep can also fail and output the string FAIL. We will show that this only happens with 
small probability. The algorithm BalSep is defined in FigureHl The constants in this presentation 
are not optimized and are likely to be higher than what is necessary in practice. They can also 
be modified to obtain different trade-offs between the approximation guarantee and the output 
balance. 

At iteration t = 1, we have fi^'^^ = 0, so that P^^-* is just the probability transition matrix of 
the heat kernel on G for time t. In general at iteration t, BalSep runs ExpV to compute O(logn) 
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Input: An unweighted connected instance graph G = {V,E), a constant balance value b G 
(0, 1/2], a conductance value 7 G [V"^/ !)• 

Let S = 0, S = V. Set t = iog"/i27 and jB^^) = 0. 
At iteration t = 1, . . . ,T — 12 log n : 

1. Denote p('' = Pt(j6^''). Pick = 0(iogV£2) random unit vectors {wf^w^^^ . . . ,m['^ G R"} 
and use the subroutine ExpV to compute the embedding {vf^ G R'^},gi/ defined as 

Let X^') be the Gram matrix corresponding to this embedding. 

2. If L{Kv) • X(^) = ZiGV '^iW^f^ - ^avgl P < output NO and terminate. 

3. Otherwise, run FindCut(G, 7, {uf ^ FindCut outputs a cut S^^^ with (^(S^*)) < 
0(1/7) or fails, in which case we also output FAIL and terminate. 

4. If S^*^ is c-balanced, output 5*^*^ and terminate. If not, update S = S U S^^K If S is c- 
balanced, output S and terminate. 

5. Otherwise, update /3(^+^) = ji^*^ + ^ L/esW ^' proceed to the next iteration. 
Output NO and terminate. 

Figure 1: The BalSep Algorithm 

random projections of P^'^ and constructs an approximation {^f },ey to the embedding given by 
the columns of D^^^^P^^\ This approximate embedding has Gram matrix X^'). 

In Step 2, BalSep computes L{Kv) • X('), which is an estimate of the total deviation Y(p(*), V) 
by Fact 15.51 If this deviation is small, the AHK walk p('' has mixed sufficiently over G to yield 
a certificate that G cannot have any fc-balanced cut of conductance less than 7. This is shown 
in Lemma 15.61 If the AHK walk p(') has not mixed sufficiently, we can use FindCut to find a 
cut S*^'' of low conductance 0(1/7), which is an obstacle for mixing. If S^'^ is c-balanced , we 
output it and terminate. Similarly, if S U S^^^ is c-balanced, as (p{S U S^*^) < 0(1/7), we can also 
output S U S^*^ and exit. Otherwise, S^*^ is unbalanced and is potentially preventing BalSep from 
detecting balanced cuts in G. We then proceed to modify the AHK walk, by increasing the values 
of /5(*+^) for the vertices in S^^\ This change ensures that P(*+^) mixes faster from the vertices in 
S^*^ and in particular mixes across S^''\ In particular, this means that, at any given iteration t, the 
support of jS'^) is U^j.'l\S^''\ which is an unbalanced set. 

The BalSep algorithm exactly parallels the RLE algorithm, introducing only two fundamen- 
tal changes. First, we use the embedding given by the AHK random walk P'^^ in place of the 
eigenvector to find cuts in G or in a residual graph. Secondly, rather than fully removing unbal- 
anced low-conductance cuts from the graph, we modify jS^'' at every iteration t, so p('+^) at the 
next iteration mixes across the unbalanced cuts found so far. 
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5.4 Analysis 



The analysis of BalSep is at heart a modification of the MMWU argument in OV, stated in a 
random-walk language. This modification allows us to deal with the different embedding used 
by BalSep at every iteration with respect to OV. 

In this analysis, the quantity Y{P^*\ V) plays the role of potential function. Recall that, from 
a random-walk point of view, Y{P^*\ V) is the total deviation from stationarity of Pt-(j6*^'') over 
all vertices as starting points. We start by showing that if the potential function is small enough, 
we obtain a certificate that no b-balanced cut of conductance at most 7 exists. In the second step, 
we show that, if an unbalanced cut S^'^ of low conductance is found, the potential decreases by a 
constant fraction. Unless explicitly stated otherwise, all proofs are found in Section [531 

Potential Guarantee. 

We argue that, if Y(p(^', V) is sufficiently small, it must be the case that G has no fc-balanced cut of 
conductance less than 7. A similar result is implicit in OV. This theorem has a simple explanation 
in terms of the AHK random walk P^'^\ Notice that P*-*' is accelerated only on a small unbalanced 
set S. Hence, if a balanced cut of conductance less than 7 existed, its convergence could not be 
greatly helped by the acceleration over S. Then, if p(f) is still mixing very well, no such balanced 
cut can exist. 

Lemma 5.6 Let S = [j\^^S^'\ IfY{P^'l V) < ^, and vol(S) <c-lm< Vioo ■ 2m, then 

Moreover, this implies that no b-balanced cut of conductance less than 7 exists in G. 
The Deviation of an Unbalanced Cut. 

In the next step, we show that, if the walk has not mixed sufficiently, w.h.p. the embedding 

{^f ^liGV/ computed by BalSep, has low quadratic form with respect to the Laplacian of G. From 
a SDP-rounding perspective, this means that the embedding can be used to recover cuts of value 
close to 7. This part of the analysis departs from that of OV, as we use our modified definition of 
the embedding. 

Lemma 5.7 7/Y(p('), V) > \, then w.h.p. L • X^^) < 0(7) • L(Ky) • xW. 

This guarantee on the embedding allows us to apply SDP-rounding techniques in the subroutine 
FindCut. The following result is implicit in ||25|. Its proof appears in Section lS^l for completeness. 



Theorem 5.8 Consider an embedding {v, E K'^j/gy ivith Gram matrix X such that L • X^*' < aL(_fCy) • 
X^^\for a > 0. On input {G,b,a, {vi}i^v), FindCut runs in time 0{md) and w.h.p. outputs a cut C 
with (p{C) < 0{^/a.). Moreover, there is a constant c = Ci{b) < Vioo such that either C is c-balanced or 

J2diRi»X> 2/3- L(Xy) .X. 
ieC 

The following corollary is a simple consequence of Lemma 15^71 and Theorem l5.8l 
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Corollary 5.9 At iteration t 0/ BalSep, ifY{P^*\V) > ^ and S^^) is not c-balanced, then w.h.p. 

Y(p(f),s) > V2-Y(p(^),y). 

In words, at the iteration t of BalSep, the cut S^^^ must either be c-balanced or be an unbalanced 
cut that contributes a large constant fraction of the total deviation of P^*) from the stationary dis- 
tribution. In this sense, S^*^ is the main reason for the failure of P^*^ to achieve better mixing. To 
eliminate this obstacle and drive the potential further down, P^^^ is updated to p(*+^) by acceler- 
ating the convergence to stationary from all vertices in S^^\ Formally, this is achieved by adding 
weighted stars rooted at all vertices over S^^' to the transition-rate matrix of the AHK random 
walkP(f). 

Potential Reduction. 

The next theorem crucially exploits the stability of the process 'H{fi^^^) and Corollary 15.91 to show 
that the potential decreases by a constant fraction at every iteration in which an unbalanced cut is 
found. More precisely, the theorem shows that accelerating the convergence from S^'^ at iteration t 
of BalSep has the effect of eliminating at least a constant fraction of the total deviation due to S^*^. 
The proof is a simple application of the Golden-Thompson inequality HI and mirrors the main 
step in the MMWU analysis. 

Theorem 5.10 At iteration t o/BalSep, ifY{P^^\ V) > j; and S^'^ is not c-balanced, then iv.h.p. 
Y(p(f+i), V) < Y(P('), V) - 1/3 ■ Y(P(*), S('') < 5/6 • Y(P('), V). 

We are now ready to prove Theorem 11.11 and Theorem 13. II by applying Lemma [5.101 to show 
that after 0(log n) iterations, the potential must be sufficiently low to yield the required certificate 
according to Lemma |5^ 

Proof: [Proof of Theorem 1 1.1 II If BalSep outputs a cut S in Step 4, by construction, we have that 
(p{S) < 0(^7) and S is n(&) -balanced. Alternatively, at iteration t, if L{Kv) • X^*) < i+V", we 
have by Lemma [5. 181 that 

Y(P(^ V) = LiKv) . D-ip2,(/3«) < ^LiKv) • X« < i±£ . J- < ±. 

1 — e 1 — e In 3n 

Therefore, by Lemma [521 we have a certificate that no b-balanced cut of conductance less than 7 
exists in G. Otherwise, we must have L{Kv) • X^'' > ^+^/n, which, by Lemma [5. 181 implies that 

Y(p('),y) > l/n. 

Then, by Lemma \57\ and Theorem l5.8[ we have w.h.p. that FindCut does not fail and outputs a 
cut S('' with (/'(S^^)) < 0{y/j). As BalSep has not terminated in Step 4, it must be the case that 
S(') is not c-balanced and, by Theorem[5l0l we obtain that w.h.p. Y(p('+i), V) < 5/6 • Y(p(^), V). 
Now, 

Y(P(^), V) = L{Kv) • D-^P2t{0) < I • P2t(0) < n. 

Hence, after 2 log )i/iog(6/5) < 121ogn = T iterations, w.h.p. we have that Y(p(^), < V« and, by 
Lemma [5^61 no fc-balanced cut of conductance less than 7 exists. 

We now consider the running time required by the algorithm at every iteration. In Step 1, 
we compute k = O(logn), products of the form D^^^^P'^^m, where u is an unit vector, using the 
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ExpV algorithm based on the Spielman-Teng solver, given in Theorem 13.21 This application of 
Theorem l3.2l is explained in Section |3^ By the definition of fi^*\ at iteration t we have: 



\HMH\ 



iev iev 



< 



tD-'/\L + 2-72-jD)D-'/' 
< 0(t) = poly(n 



Moreover, it is easy to see that our argument is robust up to an error S = i/poiy(n) in this computa- 
tion and the sparsity of M is 0(m) so that the running time of a single matrix-exponential- vector 
product is 0(m) . Given the embedding produced by Step 1, L{Kv) • X'*) can be computed in time 
0{nk) = 0(n) by computing the distances Wvf^ — z^ivgl |^ for all i G V. By Theorem l5.8[ Step 3 runs 
in time 0{mk) = 0{m). Finally, both Steps 4 and 5 can be performed in time 0{m). As there are at 
most 0(log n) iterations, the theorem follows. ■ 

Theorem l3.1l is proved similarly. It suffices to show that a single matrix-exponential-vector product 
requires time 0('"/v^). 

Proof: [Proof of Theorem 13. Ill Using the algorithm of Theorem 1 1.41 we obtain that \\A\\ < 0(t), 
so that k = 0(a/t) = O(Vv^) < 0(n). Hence, the running time of a single computation for this 
method is ^{m^/T) = d{"'/y^). ■ 

5.5 Proofs 

In this section we provide the proofs from the Section |5l We start with some preliminaries. 

5.5.1 Preliminaries 

Vector and Matrix Notation. 

For a symmetric matrix A, denote by A,(A), the smallest eigenvalue of A. For a vector x G R", 
let supp(x) be the set of vertices where x is not zero. 

Fact 5.11 L<l-Dand L{Si) <1-D. 

Fact 5.12 For all i E V, L{Si) = '^i/im ■ L{Kv) + diRi. In particular, L{Si) b dfRi. 

Notation for BalSep. 
At iteration t, we denote 

iev 

The following are useful facts to record about C^*^ : 

Fact 5.13 The vector D^^^ is the eigenvector ofC^*^ ivith smallest eigenvalue 0. 

Fact 5.14 C(') ^ 0(1) • I. 
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5.5.2 Useful Lemmata 

Lemma 5.15 Y(p('), V) = L{Kv) • D-^P2t(j6(*)) = Tr(e-2TCW^ _ 
Proof: By definition, we have 

Y(p(^),y) = L{Kv) . D-ip2,(/3(") = L{Kv) • D-^e-^-Q^^'") = L{Kv) • D-'''e-^^^'" D-''\ 

Using Fact 15 .11 and the cyclic property of the trace function, we obtain 

L{Kv) . D-^/^e-2-c'"D-^/^ = ( J - yimD^'^llD''^) . e-^-c*". 

Finally, by Fact l5.13l we must have that the right-hand side equals Tr(e^-^^*'*'' ) — 1, as required. ■ 

The following lemma is a simple consequence of the convexity of e^^ . It is proved in [.23 J. 
Lemma 5.16 For a symmetric matrix A G ]R">^" such that pJ >: A >: and t > 0, we /jflue 

The following are standard lemmata. 

Lemma 5.17 (Golden-Thompson inequality HI) Let X,Y E R"^" be symmetric matrices. Then, 

Tr (e^+^) < Tr (e^e^) . 

Lemma 5.18 (Johnson-Lindenstrauss) Given an embedding {vj S R'*},£y, V = [n], let Ui, 112, ■ ■ ■ ,ui, 
be vectors sampled independently uniformly from the n — 1-dimensional sphere of radius y/n/l. Let U be 

thek X t matrix having the vector as row and let Vj = Uvj. Then, for kf, = 0{^osn/s^),for all i,] G V 

II ii2 II ii2 II ii2 

(1 — £) ■ — uyll <|p; — yyll < (1 + e) • — Uyll . 

5.5.3 Proof of Lemma [5^ 
Proof: 

Let S = U^iS(') and set j6 = ji^'l By Lemma l5l5l we have Tr(e-2TC(')^ -1 < Vsn. Hence, 
'^n-i(s^^^*'''') < Vsn, which implies that, by taking logs, 

A.(c<'>) >'^>ir 

This can be rewritten in matrix terms, by Fact 15.11 and Fact l5.13l and because supp(/3) = S by the 
construction of BalSep: 

L + Y^^fh{Si)h3j-L{Kv). (1) 
which proves the first part of the Lemma. 
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For the second part, we start by noticing that, for / G S, fif^ < 727 ■ ^/t < Vly. Now for any 
b-balanced cut U, with yo\{U) < vol(Li), consider the vector Xu defined as 



2m vol(U) lOr Z fc U 



2m vol(U) I t U 



Applying the guarantee of Equation [H we obtain 



Notice that 



xJjLxu + 727 ■ X] x^L{Si)xu h 3j ■ xJjL{Kv)xu. 



Hence, our guarantee becomes 



As vol(S) < Vloo • 2m < voi(u)/ioo, we have > 7. 



5.5.4 Proof of Lemma ISlTl 



Proof: Consider L • D ^Pxrifi^'^^) = L • D ^l^e ^tC'"^^ Using the cyclic property of trace and 
the definition of C'*^, we have that 



-2tCW 



We now consider the spectrum of C^^\ By Fact 15 .13] the smallest eigenvalue is 0. Let the remaining 
eigenvalues be A2 < A3 < ■ • • < A„. Then, C^'' • e^-^'^'-*'' = Yll^i^i^ "^^^' ■ analyze these 

eigenvalues in two groups. For the first group, we consider eigenvalues smaller than 247 ^rid use 
Lemma [S.lSi together with the fact that 7 > i/h^ : 

For the remaining eigenvalues, we have, by Lemma r5.14[ : 

0(1) 



X] A,e-2^'^' < 0(1) ■ n • < 

!:A,>247 



^3 
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Combining these two parts, we have: 

L. D-1P2t(/3''') < C« < 0(1) ■ ^ A,e-2^^' < 0(7) ■ L(Xy) • D-ip2r(^«). 

!:A,<247 

Now, we apply the Johnson-Lindenstrauss Lemma (Lemma I5.18|) to both sides of this inequality 
to obtain: 

L.X« < 0(7) ■L(Xy) .XW. 



5.5.5 Proof of Corollary[5l] 

Proof: By Lemma lS^ and Theorem l5.8[ we have that S^*^ w.h.p. is either c-balanced or J^j^gin diRi 
X(f) > 2/3 . L(Xy) • X('). By LemmaEHland as i+^/i-e < 4/3, we have w.h.p.: 

Y(P(^ S(^)) = X] 'i.^. • D-iP2.(/3(')) > . ( X] • X(')) > I ■ L{Kv) . X(*) > 

r TT7 • • ^"'^2.(iS(^)) = ^ ■ ^ ■ Y(p(^ > ^ ■ Y(p(^ V). 



5.5.6 Proof of Theorem [5l0l 

Proof: By Lemma [5.151 and the Golden-Thompson inequality in Lemma [5. 171 

Y(P(^+1),\/) = Tr(e-2-c"+^') _ 1 < Tr E,,,,o ^s,))D-V2^| _ 

We now apply Lemma l5.16l to the second term under trace. To do this we notice that E,g5(t) L (S,) < 
2L{Kv) d: 2D, so that 



Hence, we obtain 



Y(p('+i),y)<Tr^e-2-c'" _ (1 _ . 1 ^-V^^ ^ L(SO)D-^/^ | | - 1. 

Applying the cyclic property of trace, we get 



Y(p(f+i),y)<Y(pW,y)-^^A^ ^ E L(S0.D-ip2,(/3«). 



Next, we use Fact l5l2l to replace L(S,) by R, and notice that 288 ■ tt/t = 2 : 

,,-2 

2' 



Then, we apply the definition of Y(p(f), S) : 

Y(p(f+i), y) < Y(P(*), V) - 1/3 ■ Y(p(f\ S). 

Finally, by Corollary |521 we know that w.h.p. Y(p('),S(*)) > 1/2 • Y(p(^),y) and the required 
result follows. ■ 
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5.6 SDP Interpretation 

OV designed an algorithm that outputs either a n(&)-balanced cut of conductance 0(1/7) or a 
certificate that no ^-balanced cut of conductance 7 exists in time 0{"^/j^). This algorithm uses the 
MMWU of Arora and Kale HZ] to approximately solve an SDP formulation of the BS problem. 
The main technical contribution of their work is the routine FindCut (implicit in their ORACLE), 
which takes the role of an approximate separation oracle for their SDP. In an iteration of their 
algorithm, OV use the MMWU update to produce a candidate SDP-solution Y^^K In one scenario, 
y^'^ does not have sufficiently low Laplacian objective value: 

L.yW > Q(7)L(Xy)«yW. (2) 

In this case, the MMWU uses Equation|2]to produce a candidate solution y(*+^) with lower objec- 
tive value. Otherwise, FindCut is run on the embedding corresponding to y^'^. By Theorem |5.8[ 
this yields either a cut of the required balance or a dual certificate that y^*^ is infeasible. This 
certificate has the form 

7 ■ diRi • y(^) > Cl{j)L{Kv) • y(*^ (3) 

and is used by the update to construct the next candidate y('+^). The number of iterations neces- 
sary is determined by the width of the two possible updates described above. A simple calculation 
shows that the width of the update for Equation |2] is 0(1), while for Equation|3l it is only 0(7). 
Hence, the overall width is 0(1), implying that 0(logn/-y) iteration are necessary for the algorithm 
of OV to produce a dual certificate that the SDP is infeasible and therefore no ^-balanced cut of 
conductance 7 exists. 

Our modification of the update is based on changing the starting candidate solutions from 
y(i) (X to X(i) oc D-'/^e-^TD-VzLD-Vz^-i/z Lemma [SH and Lemma EZl we show that this 
modification implies that all X^^^ must now have L • X^^' < 0(7) • L{Kv) • X^'^ or else we find a 
dual certificate that the SDP is infeasible. This additional guarantee effectively allows us to bypass 
the update of Equation|2]and only work with updates of the form given in Equation|3l As a result, 
our width is now 0(7) and we only require O(logn) iterations. 

Another way to interpret our result is that all possible t ~ log "/ 7 updates of the form of Equa- 
tion|2]in the algorithm of OV are regrouped into a single step, which is performed at the beginning 
of the algorithm. 

5.7 The FindCut Subroutine 

Most of the material in this Section appears in Il25ll or in p3|. We reproduce it here in the language 
of this paper for completeness. The constants in these proofs are not optimized. 

5.7.1 Preliminaries 

Fact 5.19 For a subset S Q V, 
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Proof: ByFactlHlIl 



i<=S !GS 



Moreover, by the definitions it is clear that 



Combining these two equations, we obtain the required statement. ■ 

The following is a variant of the sweep cut argument of Cheeger's inequality |[TT]| , tailored to 
ensure that a constant fraction of the variance of the embedding is contained inside the output cut. 



Lemma 5.20 Let x G R", x > 0, such that x^Lx < A and vol(supp(x)) < ^'"/2. Relabel the vertices so 
that Xi > X2> ■ ■ ■ > Xz-i > and x^ = ■ ■ ■ = x„ = 0. For is [z — 1], denote by S, C V, the sweep cut 
{1,2, . . . ,/}. Further, assume that Y^'i^i dixf < 1, and, for some fixed S [z — 1], Y^'i^j^dixj > a. Then, 
there is a sweep cut Sj, ofx such that z — 1 > h > k and (p{Sj,) < ^/a ■ \f2X. 

We will also need the following simple fact. 

Fact 5.21 Given v,u,t G R'', (||y - f|| - ||m - t\f < \\v - uf . 



5.7.2 Roundable Embeddings and Projections 

The following definition of roundable embedding captures the case in which a vector embedding of 
the vertices V highlights a balanced cut of conductance close to a in G. Intuitively, in a roundable 
embedding, a constant fraction of the total variance is spread over a large set R of vertices. 

Definition 5.22 (Roundable Embedding) Given an embedding with Gram matrix X, denote 

by Y the total variance of the embedding: Y = L(_Ky) • X. Also, let R = {i & V : \\vi — WavgH^ < 

we say that {vi}iQV roundable for (G, a) if: 

• L • X < aY, 

. L(Xk).X>^. 

A roundable embedding can be converted into a balanced cut of conductance 0{^/a) by using 
a standard projection rounding, which is a simple extension of an argument already appearing 
in ISl and fZl. The rounding procedure ProjROUND is described in Figure |2] for completeness. It 
is analyzed in Il25l and Il23ll . where the following theorem is proved. 

Theorem 5.23 (Rounding Roundable Embeddings) R25\\23^ If {vi S lR^}i(=v is roundable for (G, 
then ProjROUND ({i;,},gy, b) produces a D.{b)- balanced cut of conductance O (V*) with high probabil- 
ity in time 0{nh + m). 
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1. Input: An embedding {vi G M^},^v, b G (0,1/2]. 

2. Let c = Cl{b) < V 100 be a constant, fixed in the proof of Theorem l5.23l in Il25l . 

3. ¥ort = l,2,...,0(logn): 

a. Pick a unit vector u uniformly at random from S''^^ and let x G K" with x,- = y/h ■ 



b. Sort the vector x. Assume w.l.og. that Xx> X2> ■ ■ ■ > Xn- Define S, = {; G [n] : x^ > 

Xi}. 

c. Let S(') = {Si, Si) which minimizes (p{Si) among sweep-cuts for which vol(S,) G 
[c-2?n, (1 -c) -Im]. 

4. Output: The cut S'*^ of least conductance over all choices of t. 



Figure 2: ProjROUND 

5.7.3 Description of FindCut 

In this subsection we describe the subroutine FindCut and prove Theorem 15 .81 

Theorem 5.24 (Theorem [5l8l Restated) Consider an embedding {y,- G R'^jj-gy with Gram matrix X 
such that L • X^^) < ixL{Kv) • X^*\ for a > 0. On input {G,h,Di, {vi}i^v), FindCut runs in time 
0{md) and w.h.p. outputs a cut C with (p{C) < 0{^/a). Moreover, there is a constant c = Ci{b) < Vioo 
such that either C is c-balanced or 

J2diRfX> y3-L{Kv)»X. 

ieC 

Proof: By Markov's inequality, vol(R) < V(32 (i-fc)) ■ 2m < V16 ■ 2m < 1/32 ■ 2m. By assumption. 
Case 1 cannot take place. If Case 2 holds, then the embedding is roundable: by Theorem I5.23[ 
PrOJCut outputs an n(b)-balanced cut C with conductance 0{^/a.). If this is not the case, we are 
in Case 3. 

We then have L{Kr) < Y/128 and, by Fact l5l9l 

S a 2m V 128^ - V 128,' 

>(l-^]-Y. 

It must be the case that R = Sg for some g G [n], with ^ < z as vol(Sg) < vol(Sz). Let k < z 
be the the vertex in R such that jj-^^ djrj > 3/4 ■ (1 - 5/i28) and L^^^djrj > 1/4 • (1 - V128). By 
the definition of z, we have k < g < z and rl < Vfc • "^/im < 8 • ■ ^/im. Hence, we have 

rz < 1/2 • r„ for all i > g. Define the vector x as = (r; — r^) for i G Sz and r, = for i iSz - Notice 
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1. Input: Instance graph G, balance b, conductance value a. and embedding {c,};ey, with 
Gram matrix X. 

2. Let Yi = \\vi - yavgll for all i e V. Denote Y = L{Kv) • X and define the set R = {i eV : 

rj < 32- (i-^')A-Y/2m}. 

3. Case 1: If L • X > ctY, output FAIL and terminate. 

4. Case 2: If L{Kr) • X > "^/iis, the embedding {z7,},ey is roundable for (G, b, a). Run Pro- 
jROUND, output the resulting cut and terminate. 

5. Case 3: Relabel the vertices of V such that ri > r2 > ■ ■ ■ > r„ and let Si = {1, . . . ,i} 
be the sweep cut of r. Let z the smallest index such that vol(Sz) > V4 • 2m. Output the 
most balanced sweep cut C among {Si, . . . , Sz-i}, such that (p{C) < 40 • ^/j. 



Figure 3: FindCut 

that: 



{i,j}&E {;,/}GE 
FactEm _ „ „2 

{y}GE 

Also, X > and vol(supp(x)) < V4 ■ 2m < 2"'/2, by the definition of z. Moreover, 

! = 1 ! = 1 ( = 1 

and 



i=k 

i=k 

> 1/16 ■ (1 - 5/128) ■ Y > 1/20 • Y. 

Hence we can now apply Lemma [5.201 to the vector i/y ■ x. This shows that there exists a sweep 
cut S;, with z > h > k, such that (p{Sf,) < 40 • ^/j. It also shows that C, as defined in Figure|3l must 
exist. Moreover, it must be the case that Q Sj, C. C. As h > k, we have 

d,R, . X = rf.rf > x: d,r^ > ^ f 1 - 4) ■ ^ ^ V ^ = V ^(^^) • ^- 



iec ieC i=l 
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Finally, using the fact that {f/l^gy is embedded in d dimensions, we can compute L • X in 
time 0{dm). Moreover, L(Xy) • X can be computed in time 0{nd) by using the decomposition 
L(Xy) • X = YLi^v^i — I'avgll'^ • By the same argument, we can compute L{Kr) • X in time 
0{nd). The sweep cut over r takes time 0(m). And, by Theorem I5.23[ ProjROUND runs in time 
0{md). Hence, the total running time is 0{md). ■ 

6 Computing exp{—A)v 

In this section, we describe procedures for approximating exp{—A)v up to an £2 error of ||z;||, 
given a symmetric PSD matrix A and a vector v {w.l.o.g., \\v\\ = 1). In particular, we give the re- 
quired procedures and proofs for Theorems ll.2[ Theorem ll.4l and Theorem l3.2[ For this section, we 
will assume the upper bound from Theorem l7. 1 1 (which is a more precise version of Theorem ll.5|l , 
regarding polynomials approximating e^^. Discussion about this theorem and the proofs are in- 
cluded in SectionlZl We restate the basic definitions used in this section for completeness. 

Definitions. 

We will always work with square n x n matrices over R. For a matrix M, abusing notation, we 

will denote its exponential by exp(— M) which is defined as I],>o ^p-M'. ||M|| = sup||^.||^^ W^^W 
denotes the spectral norm of M. M is said to be Symmetric and Diagonally Dominant (SDD) if, 
Mij = Mji, for all z,; and Ma > for all /. M is called Upper Hessenberg if, (M)y = 

for i > j + 1. M is called tridiagonal if M,y = for i > j + 1 and for j > i -\- 1. Let A(M) denote the 
spectrum of a matrix M and let Ai (M) and A„ (M) denote the largest and the smallest eigenvalues 
of M respectively. For a matrix M, let mM denote the number of non-zero entries in M. Further, 
let f^i denote the time required to multiply the matrix M with a given vector w. In general t^ 
depends on how M is given as an input and can be 0(n^). However, it is possible to exploit the 
special structure of M if given as an input appropriately: It is possible to just multiply the non-zero 
entries of M, giving t^ = 0(/«m)- Also, if M is a rank one matrix zvzv^ , where zv is known, we can 
multiply with M in 0(n) time. For any positive integer k, let E)t denote the set of all polynomials 
with degree at most k. Given a degree k polynomial p = YL^^o '^i ' ^1 riorm of p, denoted as 

IIpII-^ is defined as \\p\\i = Y!d>Q I'^'l- 

Algorithms for Theorem \T2\\1.3\ and \3.2\ 

Theorem |1.2ill.3| and |3.2| are based on a common algorithm we describe, called ExpRational (see 
Figure Is]), which requires a procedure Invert/^ with the following guarantee: given a vector y, a 
positive integer k and ei > 0, \nyertA{y,k,ei) returns a vector Ui such that, || + ^/k)^^y — Ui\\ < 
ei 1 1 1/1 1 . The algorithms for the two theorems differ only in their implementation of Invert^. We 
prove the following theorem about ExpRational. 

Theorem 6.1 (Running Time of ExpRational given Invert^) Given a symmetric p.s.d. matrix A ^ 
0, a vector v with \\v\\ = 1, an error parameter < ^ < 1 and oracle access to lnvertA,/o?' parameters 
k = ©(logi/'j) and ei = exp(— ©(fclogfc + log(l -|- || A||))), ExpRational computes a vector u such 
that \\exp{—A)v — u\\ < 5, in time 0(7^^^^ ■ k + n ■ k^ k?), where T^^^^ is the time required by 
lnvert^(-,A:, £1). 
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The proof of this theorem appears in Section |631 Theorem 11.21 will follow from the above theorem 
by using the Spielman-Teng SDD solver to implement the Invert^ procedure (See Section r6.3.1|l . 
For Theorem 13.21 we combine the SDD solver with the Sherman-Morrison formula (for matrix 
inverse with rank 1 updates) to implement the Invert/^ procedure (See Section [6!4]l . 

Algorithm for Theorem lOl 

The procedure and proof for Theorem 1 1.41 is based on the well-known Lanczos method. We give 
a description of the Lanczos method (e.g. see l|29H ) in Figured and give a proof of a well known 
theorem about the method that permits us to extend polynomial approximations for a function / 
over reals to approximating / over matrices (Theorem l6.7|) . Combining our result on polynomials 
approximating e^^ from the upper bound in Theorem 17.11 with the theorem about the Lanczos 
method, we give a proof of the following theorem that immediately implies Theorem ll.4[ 

Theorem 6.2 (Running Time Using LANCZOS) Given a symmetric p.s.d. matrix A, a vector v with 
\\v\\ = 1 and a parameter < 5 < \,for 

k^O (^^max{log'iA (Ai(A) - A, (A)) ■ logi/<5} ■ (log Vs) ■ log log , 

andf{x) = e^^, the procedure Lanczos computes a vector u such that \\exp{—A)v — u\\ < ||exp(— <5, 
The time taken by LANCZOS is O ((n + tA)k + k^). 

Remark 6.3 Note the k^ term in the running time for Theorem 16.11 and the term in the running time 
for Theorem \6?2\ This is the time required for computing the eigendecomposition of a (fc + 1) x (fc + 
1) symmetric matrix. While this process requires 0{k^) time in general, as in Theorem [6Jl in case of 
Theorem \62\ the matrix is tridiagonal and hence the time required is 0{k^) (see |26|). 

Organization. 

We first describe the Lanczos method and prove some of its properties in Section 16.11 Then, we 
give descriptions of the LANCZOS and the ExpRational procedures in Section \62\ Assuming 
Theorem I6.1[ we give proofs of Theorem 11.21 and Theorem 13.21 in Section 16.3.11 and Section 16.41 
respectively by implementing the respective In vert ,4 procedures. Finally, we give the error analysis 
for ExpRational and a proof for Theorem 16. II in Section [631 

6.1 Lanczos Method - From Scalars to Matrices 

Suppose one is given a symmetric PSD matrix B, and a function / : R 1— > R. Then one can 
define /(B) as follows: Let Mi,. ..,u„ be eigenvectors of B with eigenvalues Ai,. . .,A„. Define 
/(B) = Y^i f{^i)uiuj . Given a vector v, we wish to compute f{B)v. Since exact computation of 
/(B) usually requires diagonalization of B, which is costly, we seek an approximation to / (B) c. 

For a given positive integer k, the Lanczos method looks for an approximation to f{B)v of the 
form p{B)v, where p is a polynomial of degree k. Note that for any polynomial p of degree at 
most k, the vector p{B)v is a linear combination of the vectors {v, Bv, . . . , B^v}. The span of these 
vectors is referred to as the Krylov Subspace and is defined below. 

Definition 6.4 (Krylov Subspace) Given a matrix B and a vector v, the Krylov subspace of order k, 
denoted by /C(B, v, k), is defined as the subspace that is spanned by the vectors {v, Bv, . . . , B'^v}. 
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Note that any vector in}C{B,v,k) has to be of the form p{B)v, where p is some degree k polynomial. 
The Lanczos method starts by generating an orthonormal basis for }C{B, v, k). Let vq,. . . , vj^ be any 
orthonormal basis for )C{B, v, k), and let VJt be the n x (fc + 1) matrix with {W;} -Lg columns. 
Thus, V^Vk = Ik and VkVj^ denotes the projection onto the subspace. Also, let Tjt be the operator B 
in the basis {z?,}^^^, restricted to this subspace, i.e., = BV]^. Since, all the vectors v, Bv, . . . , B^v 
are in the subspace, any of these vectors (or a linear combination of them) can be obtained by 
applying Tj. to v (after a change of basis), instead of B. The following lemma proves this formally. 

Lemma 6.5 (Exact Computation with Polynomials. See e.g. l29ll ) Let V]^ be the orthonormal basis, and 
be the operator B restricted to )C{B,v,k) where \\v\\ = 1, i.e., Tj^ = V^BV]^. Let p be a polynomial of 
degree at most k. Then, 



p{B)v = V,p{T,)V,^ 



V. 



Proof: Recall that Vj^Vj^ is the orthogonal projection onto the subspace }C{B,v,k). By linearity, it 
suffices to prove this when p is for t < k. This is true for t = since V]^Vj^v = v. For any j < k, 
B'v lies in X:(B, v, k), thus, V ; < fc, VkV^^Bh - Bh. Hence, 



B'v = {V^V^)B{V^V^)B ■ ■ ■ B{VkV^)v 

= Vk{V^BV,){V^BV,) ■ ■ ■ {V^BVk)V^v 



The following lemma shows that V]^f{Ti()Vj^v approximates f{B)v as well as the best degree 
k polynomial that uniformly approximates /. The proof is based on the observation that if we 
express / as a sum of any degree k polynomial and an error function, the above lemma shows that 
the polynomial part is exactly computed in this approximation. 

Lemma 6.6 (Approximation by Best Polynomial (Lemma 4.1, II29I )) Let be the orthonormal ba- 
sis, and Ti be the operator B restricted to K,{B,v,k) where \\v\\ = 1, i.e., Tj^ = Vj^BV^. Let / : R — > R fee 
any function such that f{B) and f{Tk) well-defined. Then, 

f{B)v-Vkf{T,)V^v < min (max |/(A)-p,(A)|+ max |/(A) - p,(A) |) . 
Proof: Let pj. be any degree k polynomial. Let rj^ = f— p^. Then, 



f{B)v-V^f{Tk)V^v < p,{B)v-V,pj,{T,)V^v 



<0+||r^(B)|K 
= max |rjt(A)| 

AeA(B) 

Minimizing over p/^ gives us our lemma. 



Vkrk{Tk)V^ 



r,{B)v-VMTk)V^v 

(Using Lemma |6!5)) 



■ max |r^(A)|. 

AGA(T,) 



Observe that in order to compute this approximation, we do not need to know the polynomial 
explicitly. It suffices to prove that there exists a degree k polynomial that uniformly approximates 
/ well on an interval containing the spectrum of B and T/^ (For exact computation, A{T]() C A(B).) 
Moreover, if k <^ n, the computation has been reduced to a much smaller matrix. We now show 
that an orthonormal basis for the Krylov Subspace, Vj^, can be computed quickly and then describe 
the Lanczos procedure. 
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Input: A symmetric matrix B >: 0, a vector v such that \\v\\ = 1, a positive integer k, and a 
function / : R ^ R. 

Output: A vector u that is an approximation to f{B)v. 

1. Initialize vq = v. 

2. For Z = to A; — 1, (Construct an orthonormal basis to Krylov subspace of order k 

def 

a. If i = 0, compute Wq = Bvq- Else, compute Wi = Bvi — ^iVi^i. (Orthogonalize w.r.t. Vj_i) 

b. Define a, = vjwi and w'- = Wi — iXjVj *. (Orthogonalize w.r.t. u,) 

c. Define j6;+i = and Vi_^_l = w'^/fii^i. (Scaling it to norm 1) 

3. Let V)t be the n x (/c + 1) matrix whose columns are vq, . . . ,V}: respectively. 

4. Let Tk be the (fc + 1) x (k + l) matrix such that for all i, {Tkju = vj Bvi = oci, {Tkj^+i = 
{T,,)i+i^i = vJ^-^^Bvi = j6,+i and all other entries are 0. (Conipute T,, = VjJbV;,) 

5. Compute B=f (T^) exactly via eigendecomposition. Output the vector Vj^BVj^v. 

* If iv'- = 0, compute the approximation with the matrices T/_i and instead of T]^ and V]^. The 
error bounds still hold. 



Figure 4: The LanCZOS algorithm for approximating f{B)v 
6.1.1 Efficiently Computing a Basis for the Krylov Subspace 

In this section, we show that if we construct the basis {c,}^^q in a particular way, the matrix T)t has 
extra structure. In particular, if B is symmetric, we show that Tj^ must be tridiagonal. This will help 
us speed up the construction of the basis. 

Suppose we compute the orthonormal basis {vi}\^Q iteratively, starting from vq = v. For i = 

0. . . . ,k, we compute By, and remove the components along the vectors {vq, . . . , z;,} to obtain a new 
vector that is orthogonal to the previous vectors. This vector, scaled to norm 1, is defined to be Vj^i. 
These vectors, by construction, satisfy that for all / < k, SpanjfO/ ■ ■ ■ = Span{z;, Bv, . . . , B'^v}. 
Note that (T^),y = vjBvj. 

If we construct the basis iteratively as above, Bvj G Spanjuo/ • • • / ^j+i } by construction, and if 
i > i + 1, Vi is orthogonal to this subspace and hence vJ (Bvj) = 0. Thus, Tf^ is Upper Hessenberg, 

1. e.,{Tk)ij = Oiori> i + 

Moreover, if B is symmetric, vJ (Bvi) = vJ {Bvj), and hence T;^ is symmetric and tridiagonal. 
This means that at most three coefficients are non-zero in each row. Thus, while constructing 
the basis, at step i + 1, it needs to orthonormalize By, only w.r.t. y,_i and Vj. This fact is used 
for efficient computation of T;^. The algorithm LANCZOS appears in Figure |4] and the following 
meta-theorem summarizes the main result regarding this method. 

Theorem 6.7 (Lanczos Theorem) Given a symmetric p.s.d. matrix B, a vector v with \\v\\ = 1, a 
function f and a positive integer parameter k as inputs, the procedure Lanczos computes a vector u such 
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that, 

||/(B)z;-m|| <2-min max |/(A) - p,(A) | . 

PtG^t /\GA(B) 

Here Zjt denotes the set of all degree k polynomials and A(B) denotes the spectrum ofB. The time taken by 
LanCZOS is O {{n + tB)k + k^) . 

Proof: The algorithm LANCZOS implements the Lanczos method we've discussed here. The 
guarantee on u follows from Lemma [6.61 and the fact that A(T^) C A(B). We use the fact that 
{Tk)ij = By/ and that must be tridiagonal to reduce our work to just computing 0( A:) entries in 
T/f. The total running time is dominated by k multiplications of B with a vector, 0{k) dot-products 
and the eigendecomposition of the tridiagonal matrix Tj^ to compute /(T^) (which can be done in 
0(F) time |I26|), giving a total running time of O ((n + tB)k + k^) . ■ 

6.2 Procedures for Approximating exp {—A) v. 

Having introduced the Lanczos method, we describe the algorithms we use for approximating the 
matrix exponential. 



6.2.1 Using Lanczos for Approximating exp{—A)v - Proof of Theorem 

Theorem 11.41 follows from Theorem 16.21 which is proved by combining Theorem 16.71 about the 
approximation guarantee of the LANCZOS algorithm and Theorem 17. II (a more precise version of 
Theorem ll.SD about polynomials approximating e^^. We now give a proof of Theorem |6.2[ 

Proof: We are given a matrix A, a unit vector v and an error parameter 3. Let Pa„(a),Ai(a)//2(^) 
be the polynomial given by Theorem 17.11 and let k be its degree. We know from the theorem 

i^^iP\„{A)M{A)s/2{x) satisfies sup^g[^^(^)^^^(^)] k"''' - Pa„(a),Ai(a)//2(^)I < V2 • e"^"'^', and that its 
degree is, 

k = O (^^max{log2iA (Ai(A) - A„(A)) ■ logV<5} ■ (logiA) • loglogiA^ . 

Now, we run the LANCZOS procedure with the matrix A, the vector v, function /(x) = e^^ and 
parameter k as inputs, and output the vector u returned by the procedure. In order to prove the 
error guarantee, we use Theorem |6.7| and bound the error using the polynomial Pa„(a),Ai(a)//2- Let 
r{x) = exp(-x) - Pa„{a)MA)//2{x). We get. 



Thm.^7\ A(A)C[A„(A),Ai(A)] T/im.O , 

||exp(-A)u - mII < 2 max |rit(A)| < 2 max In-fA)! < 3 ■ e''^"^^> 

AGA(A)' Ae[A„(A),Ai(A)] ' 

By Theorem |6.7i the total running time is 0((n + t/^)k + /c^). ■ 



6.2.2 The ExpRational Algorithm. 

Now we move on to applying the Lanczos method in a way that was suggested as a heuristic by 
Eshof and Hochbruck [13]. The starting point here, is the following result by Saff, Schonhage and 
Varga ||30H , that shows that simple rational functions provide uniform approximations to e^^ over 
[0,00) where the error term decays exponentially with the degree. Asymptotically, this result is 
best possible, see ITtI . 
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Theorem 6.8 (Rational Approximation II30II ) There exists constants Ci > 1 and such that, for any 
integer k > Rq, there exists a polynomial Pk{x) of degree k — 1 such that, 



sup 

xG[0,oo) 



exp{—x) 



(1 + ^/kf 



< Cik-2-^ . 



Note that the rational function given by the above lemma can be written as a polynomial in (1 + 
V*^)^- The following corollary makes this formal. 

Corollary 6.9 (Polynomial in (1 + ^/k)^^) There exists constants Ci > 1 and ko such that, for any inte- 
ger k > ko, there exists a polynomial p^(x) of degree k such that p^(0) = 0, and. 



sup 

fG(0,l] 



-k/t+k 



sup 

XG [0,oo) 



< cik ■ T 



(4) 



Proof: Define p\ as p\{t) = ^ ■ Pk (Vf — k) , where P]( is the polynomial from Theorem l6.8l Note 
that since Pjt is a polynomial of degree A: — 1, is a polynomial of degree k with the constant term 
being zero, i.e., pl{0) = 0. Also, for any k>kQ, 



sup 

tG (0,1] 



,-k/l+k 



sup 

xg[0,co 



-1 



sup 

xg[0,co' 



Pk{x) 



(1 + -/kf 



< cifc • 2 



-k 



The corollary above inspires the application of the Lanczos method to obtain the ExpRa- 
TIONAL algorithm that appears in Figure |5l We would like to work with the function f{x) = 
gfc(i-V-i:) gj^fj matrix B = {I + "^/k)^^ for some positive integer k and use the Lanczos method 
to compute approximation to exp{—A)v in the Krylov subspace IC{B,v,k), for small k. This is 
equivalent to looking for uniform approximations to exp(— y) that are degree k polynomials in 

{i + y/k)-\ 

Unfortunately, we can't afford to exactly compute the vector {I + ^/k)^^y for a given vector y. 
Instead, we will resort to a fast but error-prone solver, e.g. the Conjugate Gradient method and 
the Spielman-Teng SDD solver (Theorem I6.10|) . Since the computation is now approximate, the 
results for Lanczos method no longer apply. Dealing with the error poses a significant challenge 
as the Lanczos method is iterative and the error can propagate quite rapidly. A significant new 
and technical part of the paper is devoted to carrying out the error analysis in this setting. The 
details appear in Section |631 

Moreover, due to inexact computation, we can no longer assume B is symmetric. Hence, we 
perform complete orthonormalization while computing the basis }^^q. We also define the sym- 
metric matrix Ti^ = 1/2 ■ (T^^ + Tj.) and compute our approximation using this matrix. The com- 
plete procedure ExpRational, with the exception of specifying the choice of parameters, is de- 
scribed in Figure ini We give a proof of Theorem [6T] in Section 1631 



6.3 Exponentiating PSD Matrices - Proofs of Theorem 11.21 and 13.21 

In this section, we give a proof of Theorem 11.21 and Theorem 11.31 assuming Theorem 16.11 Our 
algorithms for these theorems are based on the combining the ExpRational algorithm with ap- 
propriate Invert/t procedures. 
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Input: A Matrix A y 0, a vector v such that \\v\\ = 1, and an approximation parameter e. 

Output: A vector u such that ||exp(— A)t7 — m|| < e. 

Parameters: Let = O(logiA) and ei = exp(-e(/:logfc + log(l + 

1. Initialize Vq = v. 

2. For i = to k — 1, (Construct an orthonormal basis to Krylov subspace of order k ) 

a. Call the procedure lnvert/i(f;/fc/£i)- The procedure returns a vector iVj, such that, 

+ ^/k)^^Vi — lVi\\ < £i \\Vi\\ . (Approximate {1 + y^/k)-^Vi) 

h. For i = 0,. . . ,i, 

i. ]-.et Oijj = vj Wj. (Compute projection onto ly,) 

C. Define if - = Wj — Yl)=o '^jJ'^j- (Orthogonalize w.r.t. Vj for / < i) 

d. Leta,+i^, = \\io'-\\ * and y^+i = w^/a^+i,/. (Scaling it to norm 1) 

e. For i = i + 2, . . . ,k, 

i. Let oijj = 0. 

3. Let Vjt be the n x (k + l) matrix whose columns are Vq,. . . , Vk respectively. 

4. Let T)t be the (A; + 1) x (fc + 1) matrix (a/,y)yg{o,. ..,/:} ^^id = V2(T^ + T^). (Symmetrize T^) 

5. Compute B = exp {I — T^^)^ exactly and output the vector ViBei. 

* If If J = 0, compute the approximation the matrices Ti-i and V/_i, instead of Tj. and Vj.. The error 
bounds still hold. 

Figure 5: The ExpRational algorithm for approximating exp(— A)c 
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6.3.1 SDD Matrices - Proof of Theorems [TI] 

For Theorem [L2] about exponentiating SDD matrices, we implement the Invert/^ procedure using 
the Spielman-Teng SDD solver Il37ll . Here, we state an improvement on the Spielman-Teng result 
by Koutis, Miller and Peng ||2T]| . 

Theorem 6.10 (SDD Solver l2lll ) Given a system of linear equations Mx = h, where the matrix M is 
SDD, and an error parameter e > 0, it is possible to obtain a vector u that is an approximate solution to the 
system, in the sense that 

\\u - M^'^bWu < e\\M^'^b\\M ■ 



The time required for this computation is O (m^lognlogi/e), where M is an n x n matrix. (The tilde 
hides n factors.) 

We restate Theorem [L2] for completeness. 

Theorem 6.11 (Theorem 11.21 Restated) Given an n x n symmetric matrix A which is SDD, a vector v 
and a parameter 5 <\, there is an algorithm that can compute a vector u such that \\exp{—A)v — u\\ < 
S \\v\\ in time 0((m^ + n) log(2 + || A||)). The tilde hides poly(logn) and poly(log lA) factors. 

Proof: We use the ExpRational procedure to approximate the exponential. We only need to 
describe how to implement the Invert^ procedure for an SDD matrix A. Recall that the procedure 
I nvert^, given a vector y, a positive integer k and real parameter £i > 0, is supposed to return a vec- 
tor ui such that || {I + ^/k)^^y — Mi || < £i \\y\\ , in time T^Xlcei- ^Iso, observe that this is equivalent 
to approximately solving the linear system {I + ^/k)z = y for the vector z. 

If the matrix A is SDD, {I + ^/k) is also SDD, and hence, we can use the Spielman-Teng SDD 
solver to implement Invert^. We use Theorem 16.101 with inputs {I + ^^/k), the vector y and error 
parameter ei. It returns a vector wi such that, 

||(j + AA)"^y-"illa+^A-) < £ill(J + ^A)"Vlla+^A) • 

This implies that, 

||(; + A/fc)-ly - uif = {{1 + - Ui)T((; + A/k)-\j - m) 



< {{I + A/k)-'y-Ui)^ {I + A/k){{I + A/k)-^y-u^ 
2 

< 



{I + A/,)-^y - u, ^^^^^^^ <e\-\\{i + ^A)- VI! a+.A) 

= e\-y^{l + A/^)-^y<e\-y^y, 

which gives us || (I + ^A)^^i/ — Mi|| < ei \\y\\, as required for Invert^. Thus, Theorem 16. 1 1 implies 
that the procedure ExpRational computes a vector u approximating e^^v, as desired. 

The time required for the computation of Wi is T^\ = O ( {mA + n) log n log i/ei ) , and hence 
from Theorem l6.ll the total running time is O {{m^ + n) logn(logV<5 + log(l + || A||)) logV<5 + (logV^) 
where the tilde hides polynomial factors in log log n and log log i/ &. ■ 

6.3.2 General PSD Matrices - Proof of Theorem [13] 

For Theorem ll.3l about exponentiating general PSD matrices, we implement the Invert^ procedure 
using the Conjugate Gradient method. We use the following theorem. 



3^ 
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Theorem 6.12 (Conjugate Gradient Method. See 1341 ) Given a system of linear equations Mx = b 
and an error parameter e > 0, it is possible to obtain a vector u that is an approximate solution to the 
system, in the sense that 

\\u - M^^b\\M < e\\M^'^b\\M- 

The time required for this computation is O (^tM^/K{M) log Ve^ , - where k{M) denotes the condition 
number of M. 

We restate Theorem [L3] for completeness. 

Theorem 6.13 (Theorem 11.31 Restated) Given an n x n symmetric PSD matrix A, a vector v and a 
parameter & <1, there is an algorithm that can compute a vector u such that \\exp{—A)v — u\\ < S \\v\\ in 

time O (^{t^ + n)-s/l + || A|| log(2 + |1^||)^ .Herethe tilde hides po\y{\ogn) and po\j{logy s) factors. 

Proof: We use the ExpRational procedure to approximate the exponential. We run the Con- 
jugate Gradient method with the on input {I + ^/k), the vector y and error parameter £i. The 
method returns a vector Mi with the same guarantee as the SDD solver. As in Theorem II. 2[ this 
implies || (I + ^/k)^^y — Mi|| < £i \\y\\, as required for Invert^. Thus, Theorem 16.11 implies that the 
procedure ExpRational computes a vector u approximating e^^v, as desired. 

We can compute Mi in time T^J^i = O (^A^ i+v^ttw logV^i) = O (tA^/l + \\A\\\ogye?j , 
and hence from Theorem l6.1[ the total running time is 

O (^f^ ^1 + 1 1 A 1 1 (log 1/^ + log(l + 1 1 A 1 1 ) ) log 1/^ + (log 1/^) , 
where the tilde hides polynomial factors in log log n and log log i/^. ■ 



6.4 Beyond SDD - Proof of Theorem 13121 

In this section, we give a proof of Theorem |3.2l which we restate below. 

Theorem 6.14 (Theorem 13.21 Restated) Given ann x n symmetric matrix A = YlHMHTl where M is 
SDD, H is a diagonal matrix with strictly positive entries and Tl is a rank (n — 1) projection matrix = 
1 — ww"^ (w is explicitly known and \\w\\ = \),a vector v and a parameter 5 <1, there is an algorithm that 
can compute a vector u such that ||exp(— A)u — u\\ < 5 \\v\\ in time 0{{mj^ + n) log(2 + ||HMH||)). 
The tilde hides poly (log n) and poly(logV^) /actors. 

Proof: In order to prove this, we will use the ExpRational procedure. For A — YlHMHTl, 
Lemma [6.151 given below implements the required In vert procedure. A proof of this lemma is 
given later in this section. 

Lemma 6.15 (Invert^ Procedure for Theorem 13.2^ Given a positive integer k, vector y, an error param- 
eter £i, a rank {n — 1) projection matrix U = I — ww^ (where \\w\\ = 1 and w is explicitly known), a di- 
agonal matrix H with strictly positive entries, and an invertible SDD matrix M with m^ non-zero entries; 
we can compute a vector u such that + V*: • YlHMHTlY^y — u\\ < ei || (7 + ■ TlHMHTT)^^y\\ , 

in time 0{{mM + n) lognlog hides poly(log log n) /actors.) 
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Assuming this lemma, we prove our theorem by combining this lemma with Theorem |67 
about the ExpRational procedure, we get that we can compute the desired vector u approxi- 
mating e^'^v in total time 

O ( (mM + n) log n(log + log(l + \\HMH\\)) log V<5 + (log i^f ) , 

where the tilde hides polynomial factors in log log n and log log 1/ 5. ■ 

In order to prove Lemma FS.lSl we need to show how to approximate the inverse of a matrix of 
the form HMH, where H is diagonal and M is SDD. The following lemma achieves this. 

Lemma 6.16 Given a vector y, an error iparameter ei, a diagonal matrix H with strictly positive en- 
tries, and an invertible SDD matrix M with mm non-zero entries; we can compute a vector u such that 
||(HMH)"^i/ - m||^^^ < £1 ||(HMH)"^y||^^^, m time d{{mM + n)lognlog'^/ei). (The tildehides 
factors o/loglognj 

Proof: Observe that {HMH)-'^y = H-'^M-'^H-'^y. Use the SDD solver (Theorem [6l0l) with 
inputs M, vector H^^y and parameter ei to obtain a vector wi such that. 



M-'{H-'y) -ui 



M 



< £1 



M'^H-^y 



M 



Return the vector u = H ^ui. We can bound the error in the output vector u as follows. 



{HMHy\j-u 



HMH 



HMH 



H'^M'^H-^y -H'^ui 

{H-^M-^H-^y - H-^UiY {HMH){H-^M-^H-^y - H-^Ui] 
{M'^H-^y - UiY M{M-^H-^y - Ui) 

M-^{H-hj) -ui 



M 



= £\{H-^M-^H-hj) ' {HMH){H-^M-^H-hj) 



Thus, ||(HMH)-iy 



I HMH 



H-^M-^H-^y 



HMH 



{HMH)-'y 



HMH 



< £1 II (HMH) ^yll ir^T, . Since H is diagonal, multiplication by 



H ^ requires 0(n) time. Hence, the total time is dominated by the SDD solver, giving a total 
running time of 0{{mM + n) lognlog Vfi)- ■ 

Now, we prove Lemma [6. 151 

Lemma 6.17 (Lemma 16.151 Restated) Given a positive integer k, vector y, an error parameter ei, a rank 
(n — 1) projection matrix 11 = I — ww^ (where \\w\\ = 1 and w is explicitly known), a diagonal matrix H 
with strictly positive entries, and an invertible SDD matrix M with m^ non-zero entries; we can compute a 
vector u such that + V/c ■ UHMHUy^y - u\\ < £1 || (J + ^/k ■ UHMHIl)-'^y\\ , in time 0{{mM + 
n) log n log ) . (jiie tilde hides poly (log log n) factors.) 
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1. Compute z = y — {w^y)w. 



2. Estimate [I + Mi) ^z with error parameter g^qip^^jj • Denote the vector returned by j6i. 

3. Estimate (I + Mi)^^w with error parameter g^x+fMJ)" Denote the vector returned by /32. 

4. Compute 



def g IV Ml|6l . , , T ^ /m 



Return Ui. 



Figure 6: The In vert procedure for Theorem l3.2l 

Proof: We sketch the proof idea first. Using the fact that zv is an eigenvector of our matrix, we will 
split y into two components - one along w and one orthogonal. Along w, we can easily compute 
the component of the required vector. Among the orthogonal component, we will write our matrix 
as the sum of 7 + i//c • HMH and a rank one matrix, and use the Sherman-Morrison formula to 
express its inverse. Note that we can use Lemma [6.151 to compute the inverse of 7 + V*: • HMH. 
The procedure is described in Figure [6] and the proof for the error analysis is given below. 

Let Ml = i/it ■ HMH. Then, I + ^k- UHMHU = 7 + UM^U. Without loss of generality, we 
will assume that \\y\\ = 1. Note that I + nMiH y 0, and hence is invertible. Let z = y — {w^y)w. 
Thus, w^z = 0. Since iv is an eigenvector of {I + DMilT) with eigenvalue 1, we get, 

{I + nMin)-^ = + nMin)-iz + {w'^y)w. (5) 



Let's say t = {I + nMin)"^z. Then, t + UMiTlt = z. Left-multiplying by ' , we get, t = 
w^z = 0. Thus, Ht = t, and hence (I + I\Mi)t = z, or equivalently, t = {I + nMi)"^z. 

(I + nMin)-iz = (I + nMO^^z ={I + Mi- ww'^Mi)-^z = (; + Mi - iv{Miivy)-^z 

{I + MiY 



1 {I + Mi)-^ww'^ Mi{I + Mi)-^ 



1 + iv^ Mi{I + Mi)-^w 
(Sherman-Morrison formula) 

/r 1 W^Mlfl + Ml)"^Z ,^ 1 

= (J + Mi)"^z ^-^^ " , , (I + Miy^w (6) 

Since we can write I + Mi = I + HMH = H{H-^ + M)H, we can use Lemma 16.161 to estimate 
(I + Mi)^^z and {I + Mi)^^w. Using Equation the procedure for estimating (J + nMin)^^^: 
is described in Figure [6l 

We need to upper bound the error in the above estimation procedure. From the assumption, we 
know that ^1 = (7 + Mi)-iz - ei, where ||ei||(j+^^) < 6(i+||Mi||) IK-^ + ^i)~^^II(i+Mi)^ ^2 = 
(7 + Mi)-^x - 62, where , ||e2|| (j+Mi) < 6(i+||Mi||) H + ^i)"^^^II(7+Mi) • Combining Equations ^ 
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and (O and subtracting Equation (jT)), we can write the error as. 



wTMi(7 + Mi)"^z-r<;TMie2 



(1 + w^Mi{I + Mi)-%)(1 + w^Mi[{I + Mi)-hv - ei]) 

w'^Miei r^f, M^-i - 1 a;^Mi(/ + Mi)~^z 

1 + w^Mi[{I + Mi)-iw - 62] l + ryTMi[(; + Mi)-%-e2] 



< 1 



62 



Let us first bound the scalar terms. Note that ||z|| < ||i/|| =1. 

|w^Mi(7 + Mi)"iz| < Mi(; + Mi)"^z < Mi(J + Mi)"^ 



vyUxei < \\io\\ \\Mi\\ \\ei\\ < \\Mi\\ < ^i/e 



£i/6 ■ 



7 + Mi)-iz 



(I + Mi)-'/^z 



/+Mi) 
< £i/6. 



Similarly, ' Mi£'2 < • • Also, Mi(J + Mi)"^ ^ and hence w ' Mi(I + Mi)"^zy > 0. Thus, 



{I + UMiUy^y-Ui < {I + UMiUy^y-Ui 



. II II 1 ■ ^i/6 

- 11^1 II a+MO + 1.(1 _.,/6) 



(J + Mi)-^zi; 



(1-^6) 



< 



6(1 + ||Mi| 



(7 + Mi)-iz 



4 ■ £i/6 



(7+Mi) 1 — £i/6 



7 + Mi)-iz<; 



(J+Mi) 



(I + Mi 



(I + Mi 



^-1/2 



Other than the estimation of (J + Mi)^^z and (7 + Mi)^^h;, we need to compute a constant num- 
ber of dot products and a constant number of matrix- vector products with the matrix Mi. Mul- 
tiplying a vector with Mi = !/*:• HMH takes time 0{mM + n), giving a total time of 0{{mM + 
n)lognlogl^^) 

6.5 Error Analysis for ExpRational 

In this section, we give the proof of Theorem l6.1[ except for the proof of a few lemmas, which have 
been presented in the Section l6.5.1l for better readability. 

Proof Overview. 

At a very high-level, the proof follows the outline of the proof for Lanczos method. We first show 
that assuming the error in computing the inverse is small, Tjt can be used to approximate small 
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powers of B = {I + ^/k) ^ when restricted to the Krylov subspace, i.e. for all / < k, \\B'v — 
VjcTj^Vj^vW ^ £2, for some small £2.. This implies that we can bound the error in approximating 
p ( ( 7 + ^//c) ^ ^ ) using p ( T;t ) / by £2 1 1 p 1 1 1 , where p is a polynomial of degree at most k. This is the most 
technical part of the error analysis because we need to capture the propagation of error through 
the various iterations of the algorithm. We overcome this difficulty by expressing the final error as 
a sum of k terms, with the term expressing how much error is introduced in the final candidate 
vector because of the error in the inverse computation during the i^^ iteration. Unfortunately, the 
only way we know of bounding each of these terms is by tour de force. A part of this proof is to 
show that the spectrum of T;^ cannot shift far from the spectrum of B. 

To bound the error in the candidate vector output by the algorithm, i.e. \\f{B)v— Vj^f{T}^)V^ v\\, 
we start by expressing e^^ as the sum of a degree /c-polynomial p)t in (1 + V*;)^^ and a remainder 
function r^^. We use the analysis from the previous paragraph to upper bound the error in the 
polynomial part by £2 ||p||i . We bound the contribution of the remainder term to the error by 
bounding ||rjt(B)|| and |jr)t(T)t) |j . This step uses the fact that eigenvalues of rjt(T^) are 
where {A,}, are eigenvalues of Tj.. To complete the error analysis, we use the polynomials p^ from 
Corollary 16.91 and bound its £1 norm. Even though we do not know p^ explicitly, we can bound 
1 1 p^ 1 1 indirectly by writing it as an interpolation polynomial and using that the values it assumes 
in [0, 1] have to be small in magnitude. 



Proof: For notational convenience, define B = [l + ^/k) ^. Since the computation of By, is not 



exact in each iteration, the eigenvalues of Tj. need not be eigenvalues of B. Also, Lemma 
no longer holds, i.e., we can't guarantee that VjtT^ei is identical to B*vq. However, we can prove 
the following lemma that proves bounds on the spectrum of T/(^ and also bounds the norm of 
the difference between the vectors Vj^T^ei and B^vq. This is the most important and technically 
challenging part of the proof. 

Lemma 6.18 (Approximate Computation with Tj^. Proof in Sec. 16.5.11 The coefficient matrix T]^ gen- 
erated satisfies the following: 



1. The eigenvalues ofT^ lie in 

2. For any t < k, ifci < £2/{8{k + 1)^''^) and £2 < 1, we have, B^vq — V^Tlei 



< £2 



Here is an idea of the proof of the above lemma: Since, during every iteration of the algorithm, the 
computation of Bvi is approximate, we will express B VJt in terms of T^ and an error matrix E. This 
will allow us to express T]^ in terms of T)t and a different error matrix. The first part of the lemma 
will follow immediately from the guarantee of the Invert^i procedure. 

For the Second part, we first express BV]^ — VjfT]^ in terms of the error matrices defined above. 
Using this, we can write the telescoping sum B^V^ — V^T^ = Y!j=i B'^ K^^k ~ ^kTk)Tl ^. We use 
triangle inequality and a tour deforce calculation to bound each term. A complete proof is included 
in Section [6.5.1| 

As a simple corollary, we can bound the error in the computation of the polynomial, in terms of 
the £1 norm of the polynomial being computed. 

Corollary 6.19 (Approximate Polynomial Computation. Proof in Sec. 16.5.1) For any polynomial p 
of degree at most k, ifci < £2/ {2{k + 1)^'^^) and £2 < 1, 

p{B)vo-Vkp{fk)ei <£2||p|!i. 
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Using this corollary, we can prove an analogue of Lemma 16.61 giving error bounds on the proce- 
dure in terms of degree k polynomial approximations. The proof is very similar and is based on 
writing / as a sum of a degree k polynomial and an error function. 

Lemma 6.20 (Polynomial Approximation for ExpRational. Proof in Sec. 16.5.1) Let Vi^ be the ortho- 
normal basis and be the matrix of coefficients generated by ExpRational. Let f be any function such 

that f{B) and f{Tk) ^'^^ defined. Define ri^{x) = f{x) — p{x). Then, 

f{B)vQ-V^f{%)ei <min e2||p||i+ max \r^{\)\ + max |r;,(A)| . (8) 

P^^k y AgA(B) AGA(f^) J 

In order to control the second error term in the above lemma, we need to bounds the eigenvalues 
of T/f, which is provided by Lemma [6. 181 

For our application, f{t) = fi{t) = exp (A: • (1 — VO) so that /ic((l + V*^)^^) = exp(— x). This 
function is discontinuous at f = 0. Under exact computation of the inverse, the eigenvalues of T/f 
would be the same as the eigenvalues of B and hence would lie in (0, 1]. Unfortunately, due to 
the errors, the eigenvalues of T\ could be outside the interval. Since / is discontinuous at 0, and 
goes to infinity for small negative values, in order to get a reasonable approximation to /, we will 
ensure that the eigenvalues of 7^ are strictly positive, i.e., ei vF+1 < (1 + i//c • Ai (A))^^. 

We will use the polynomials from Corollary 16.91 in Lemma r6.20l to bound the final error. We 
will require the following lemma to bound the £i-norm of p^. 

Lemma 6.21 (£i-norm Bound. Proof in Sec. 16.5.11 Given a polynomial p of degree k such that p(0) = 
and 



sup 




= sup 




fG{0,l] 




xg[0,oo) 





< 1, 



we must have \\p\\i < {2k) 



k+l 



This lemma is proven by expressing p as the interpolation polynomial on the values attained by p 
at the k + l points 0, i/fc, . . . , V K which allows us to express the coefficients in terms of these values. 
We can bound these values, and hence, the coefficients, since we know that p isn't too far from the 
exponential function. A complete proof is included in Section [6.5.1l 

Corollary 16.91 shows that p^{t) is a good uniform approximation to e^'^^'^^ over the interval 
(0, 1]. Since A(B) C (0, 1], this will help us help us bound the second error term in Equation 
Since Tj^ can have eigenvalues larger that 1, we need to bound the error in approximating fk{t) 
by p^{t) over an interval (0, /3], where /3 > 1. The following lemma, gives us the required error 
bound. This proof for this lemma bounds the error over [1, j6] by applying triangle inequality and 
bounding the change in f]^ and p over [1, separately. 

Lemma 6.22 (Approximation on Extended IntervaL Proof in Sec. 16.5.11 For any f> >1, any degree 
k polynomial p satisfies, 

sup |p(i)-/,(f)|<|jp||^.(/3'^-l) + (/,(/3) -/,(!))+ sup \p{t)-f,{t)\. 
te(0,/3] tG(0,l] 
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We bound the final error using the polynomial in Equation ((8]|. We will use the above lemma 
for^ = l + £iVFTTand assume that ei-\/F+T < (1 + 1^ • Ai( A)) ^. 

f{B)vo-Vkf{fk)ei < £2 Uphill + max |r^(A)|+ max |rjt(A)| 

AGA(B) AeA(Tt) 

<e2||p^|li+ sup \{fk-pt)W\ + sup \{fk-pt)W\. 

AG (0,1] Ag(0,^] 

(Since A(B) C (0,1] and A(f^) C (0,/S] ) 

< £2 Uphill + sup \pl{t) - fk{t)\ + 
tG(04] 

iiP^iii-(/5'^-i) + (/,(/5) -/,(!))+ sup mt)-m\ 

fe(0,i] 

= M\\i- (£2 + iS*^ - 1) + (exp {Hf^-Wli) - 1) + 2 sup |p^(0 - /,(0| ■ 

fe(o,i] 

Given ^5 < 1, we plug in the following parameters, 

k = max{ko, log2 8^i/<5 + 2 log2 log2 »'^i/s} = O (log V^) , 

£l V32 • {k + 1)-'/^ ■ (1 + Vic • Ai(A))-l ■ {2k)-'^-\ /3 1 + £iv/fc+T, £2 = 8{k + lf% , 

where k^, c\ are the constants given by Corollary |6.9l Note that these parameters satisfy the condi- 
tion £i a/FTT < (1 + lA ■ Ai(A))-i. Corollary iH implies that p^(0) = and 

SUP lr*m fAlW-^ log2 8V^ + 21og2log2B^iA 

sup |p,(f) Mf)|<« (log2 8^1/^)2 



fe(o,i] 



(5 1/ log, log, 8ci/A (5 1 (5 

< 1 + 2 • ^2 ^2 — — < 3 < - , 9) 

~ 8 log2 V log2 8^V^ y ~ 8 3 - 8 ^ ^ 

where the last inequality uses 5 < \ < c\ and log2 x < x, Vx > 0. Thus, we can use Lemma r6.21l to 

conclude that ||p^||i < (2^)''+^. 

We can simplify the following expressions. 



exp ('</5-i)//3) - 1 < exp (kzx ■ + - 1 < exp(£2/8) - 1 < (1 + £2/4) - 1 = £2/4 , 

|B'^ - 1 = (1 + £iVfc + l)*' - 1 < exp(fc ■ £iVfc + l) - 1 < '=2/4. 
Thus the total error ||m - exp(-A)t;|| = /(B)yo - "^kf^kYx < {2k)''+^ ■ 2£2 + £2 + V4 < S. 

Running Time. 

The running time for the procedure is dominated by k calls to the Invert^ procedure with parame- 
ters k and £1, computation of at most k^ dot-products and the exponentiation of T/^. The exponenti- 
ation of can be done in time 0(P) |26|. Thus the total running time is 0(T™^^^ ■ k + n ■ k^ + k'^). 
This completes the proof of the Theorem 16. II ■ 
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6.5.1 Remaining Proofs 

In this section, we give the remaining proofs in Section [631 

Lemma 6.23 (Lemma 16.181 Restated) The coefficient matrix Tj. generated satisfies the following: 



1. The eigenvalues ofTi lie in the interval 



Ai(A) 
k 



A„(A) 
k 



2. For any t <k, ifti < £2/ (8(fc + 1)^''^) and £2 < 1, we have, B^vq — V^T^ei 



<e2. 



Proof: Given a vector y, a positive integer k and real parameter £1 > 0, lnvert/i(i/, fc, £1) returns 
a vector Ui such that — Mi|| < £1 ||i/|| , in time T™\^^. Thus, for each the vector Wi satisfies 

\\Bvi — Wi\\ < £1 \\vi\\ = £1. Also define u, as = Bvi — lOj. Thus, we get, < £1. Let E be the 
n X (k + 1) matrix with its columns being mq, . . . ,u^. We can write the following recurrence. 



BV^ = VkTk + E + 0Ck,k+iVk+i4+i ' 



(10) 



where each column of E has £2 norm at most Z\. Note that we continue to do complete orthonor- 
malization, so Vj^Vj^ — 4- Thus, T/^ is not tridiagonal, but rather Upper Hessenberg, i.e., {Tj^)ij = 
whenever i >;' + !. 

Multiplying both sides of Equation ((TO)) by Vj^, we get Tjt = BVj^ — Vj^E. This implies, 

% = V^BVk - 1/2 ■ {V^E + E'^Vk) (11) 
= V^{VkTk + E + iXk,k+iVk+ieJ^, ) - 1/2 ■ {V^E + E^ V^) (Using GO])) 

= T, + l/2-(yT£_£T^^)_ (12) 

Define Ei = 1/2 ■ {VjJE + E^Vk). Thus, using Equation (dH), f;t = Vj^BVk - Ei. Let us first bound 
the norm of Ei . 



lEill < V2-( 



y.'E 



+ 



E^Vk 



< V2- (IIEII + 



) < ||E||p < £iVk + l . 



Since = y^fiyjc - Ei. We have. 



Amax(f^) < Ai(B) + ||Ei|| < (l + iA-A„(A))-i + £iv//c + l, 
Amin(Tfc) > A„(B)- llEill > {l + yk-M{A))-^-eiVkTT. 

(We use Amax and Amm for the largest and smallest eigenvalues of T^ respectively in order to avoid 
confusion since T;^ is a (A: + 1) x [k + 1) matrix and not an n x n matrix.) 
First, let us compute BVj^ — Vj^Tj^. 



BVk - VA V,Tk + E + - y, (t, + 1/2 ■ {V^E - E^y,; 

7 - 1/2 ■ VkVA E + 1/2 ■ y,ETy, + . 



(13) 
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Now, 



J2B'-i{BVk-VA)fl \i 



(Telescoping sum) 



ED 



7=1 



A— ineq. f / / v \ ^ - i 

< B'-' (^{^I-yi-VkV^jE + yi-VkE'^VkjT'-'e^ 

7=1 



j=i 



(14) 



We can bound the first term in Equation ((14)) as follows. 



X] ((/ - 1/2 ■ y^y/) E + 1/2 • V,£Ty^^ f7 Ig^ll < - 1/2 ■ £ ^ . y^^T^^ 

/=1 /=1 

(Using ||B|| < 1) 

/ t ^ 



7-1 



< (^x](i+£iv^y-^ 



; - 1/2 • y^y^' ) e + 1/2 • y^E ' y. 



(Using 



T, 



<f(i + eiv/fcTT)^-^( (j - 1/2 ■ y,y,7j ||e|| +1/2- |jy,|| e^ |jy,|| 

< 2teiVk + l{l + eWk + iy-'^. (15) 
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The second term in Equation ((14)) can be bounded as follows. 



f 



< |%,fc+i|X^||B||' ^ 



<(i+ei)x:iiBir-^' 

(Using \ixk,k+i \ < < \\Bvk\\+ei < 1 + ei and CH)) 
< (1 + ei) X: ||B|r-^' ( (t, + 1/2 • _ E-^Vk))''' - 

(Using e^^T^ei = for r < A: as T/^ is Upper Hessenberg) 



V2-(y,7E-ETy,) 



<(i+ei)E 

(Using sub-multiplicity of ||-|| and ||B|| < 1) 



7-1 
z 



|T,||^'-i-' 



;=1 V'=l 



7 



I 



(Using 



and 



(eiv/fc + l)'(l + eiv/fc+Ty-i-'j 

< i,||r^|| < (l + eiVfc + l) 
1/2- (V/E-E^14) < eiVfc + l) 



<(l + ei)E((l + 2eiv^y-^-l) 

7=1 

< f(l + ei)((l + 2£iVfc + l)'-i -1). 



(16) 



Combining Equations ((T4)) , ((T5ll and ((T6)l . we get, 

y^f^eJ < 2f£iVFTT(l + eiv/FTT)'"^ + 7(l + ei)((l + 2£iv/fc+T)'-i -1) 

< ZfeiVFTTe'i^*"^)^ + te'' (^e^^i{t-i)^/k+i _ ^-^ (^sing l + x< e') 

< IteiVk + Te'^'^'-^'^^ + t{e^'''^ - 1) 

< ^2/4 • e^2/«(*+i) + k{e^/'^'+'^ - 1) (Using ei < ^2/8(^+1)^^/2 and t < k) 

< £2/4 ■ e'z/^c^+i' + h2/4{k+i) ■ (1 + £2/4()c+i)) (Using r'' < 1 + x + for < < 1) 

< £2/2 + £2/2 < £2 (Using £2 < 1 and > 0). 

This proves the lemma. ■ 

Corollary 6.24 (Corollary |6.19| Restated) For any polynomial p of degree at most k, if £1 < £2/ (2(7: + 
1)'/^) and £2 < 1, 



p{B)vQ - Vkp{Tk)ei 



<e2 



1 • 
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Proof: Suppose p{x) is the polynomial Ya^q at ■ ^^ 
p{B)vo - Vkp{fk)ei 



k k ^ 

at ■ B^vo -VkJ^^t- fid 

f=0 t=0 



k ^ k 

<^|af|- B^VQ-Vkflei < £2 X] l«tl = ^2 , 
where the last inequality follows from the previous lemma as £2, £1 satisfy the required conditions. 



Lemma 6.25 (Lemma 16.201 Restated) Let be the orthonormal basis and T/^ be the matrix of coefficients 
generated by the above procedure. Let f be any function such that f{B) and fijj^) are defined. Then, 



f{B)vo-Vkf{Tk)ei <min e2||p||i+ max \rk{A)\ + max |rjt(A)| . (17) 

P'^^k \ AGA(B) AeA(f^) / 



Proof: Let p be any degree k polynomial. Let r^^ = f — p. We express f as p + rj^ and use the 
previous lemma to bound the error in approximating p(B)yo by V]^f{Tj^)ei. 

f{B)vo - Vkf{fk)e^\\ < \\p{B)Vke^ - \4p(f,)ei|| + ||y,r,(B)ei - \4r,(f,)ei 
< £2 IIpIIi + \\Vkrk{B)e^\\ + Vkrk{%)ei 
<e2||p||i + |jr^(B)|| + ||r^(f^)| 
<£2||p||i+ max |f)c(A)|+ max |r^(A)|. 

AGA(B) AGA(f^) 

Minimizing over p gives us our lemma. 

Lemma 6.26 (Lemma [6121] Restated) Given a polynomial p of degree k such that p{0) = and 



sup 




= sup 


tG(0,l] 




i-G[0,oo) 



p((l + V^)-^) 



< 1, 



we must have < {2kY^^. 



Proof: We know that p{0) = 0. Interpolating at the k + 1 points t = 0,'^/k,^/k, . . . ,1, we can use 
Lagrange's interpolation formula to give. 

The above identity is easily verified by evaluating the expression at the interpolation points and 
noting that it is a degree k polynomial agreeing with p at fc + 1 points. Thus, if we were to write 
P(^) = I!if=i ' ^' (note that aq = 0)/ we can express the coefficients fl; as follows. 



■ h-i/k 



^ o</i<-<rt_,<t 

'^^ ~ hi Ylo<i<k,i^i{'/k-i/k) 



-p{i/k). 
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Applying triangle inequality, and noting that p{t) is a 1-uniform approximation to e ''^'^^ for t G 
(0, 1], we get. 

Thus, we can bound the £i norm of p as follows, 

k-l 



1=1 1=1 



Lemma 6.27 (Lemma 16.221 Restated) For any /3 > 1, any degree k polynomial p satisfies, 

sup \p{t)-Mt)\ < \\ph-{fi'-i) + {fk{fi)-fk{i))+ sup \p{t)-m\ 

fG(0,/5] tG(0,l] 

Proof: Given a degree k polynomial p that approximates f]^ over (0, 1], we wish to bound the 
approximation error over (0, /S] for /3 > 1. We will split the error bound over (0,1] and [1, 
Since we know that fk{^) — is small, we will bound the error over [1, /S] by applying triangle 
inequality and bounding the change in ff^ and p over [1, /3] separately. 
Let /3 > 0. First, let us calculate sup^^j-^ — fk{t)\- 

sup < sup(|p(f)-p(l)| + |p(l)-/,(l)| + |/,(l)-/,(0|) 

< sup (llplli ■ max \t' - V\ + |p(l) + - 

< sup(||p||i- max |f'-l'|) + |p(l)-/,(l)|+ sup |/,c(l) " /)c(f) I 

fe[l,/3] 0^'^'^ te[l,/S] 

< IIpII, ■ ip' - 1) + - + sup 

te[i,/S] 

(Since f > 1 and is increasing for t > 0) 

< \\p\\, ■ (/3'^ - 1) + |p(l) - /ic(l)| + (A(/3) - A(i)) 

(Since fk{t) is an increasing ftmction for t > 0). 

Now, we can bound the error over the whole interval as follows. 

sup |p(f) -A(OI = max{ sup \p{t) - fk{t)\, sup \p{t) - fk{t)\} 

te(0,/5] fG(0,l] tG[l,li] 

< max{ sup \p{t) -fk{t)\, 
te(0,l] 

iipii, ■ ifi'^ - 1) + ip(i) - A(i)i + m)-fkim 

= \\p\\, ■ ifi' - 1) + (/,(/3) -/,(!)) + sup \pit) -fkit)\ . 

fG(0,l] 
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7 Uniform Approximations to e 



In this section, we discuss uniform approximations to and prove give a proof of Theorem ll.51 
that shows the existence of polynomials that approximate e^^ uniformly over the interval [a, b], 
whose degree grows as \/b — a and also gives a lower bound stating that this dependence is nec- 
essary. We restate a more precise version of the theorem here for completeness. 

Theorem 7.1 (Uniform Approximation to e ^) 

• Upper Bound. For every < a < b, and a given error parameter < t5 < 1, there exists a 
polynomial pa,b,s ihat satisfies, 

sup \e''' - pa,h,5{^) \ < 

xe[a,h\ 

and has degree O ^ \J max{log^ V^, {b — (i) ■ log V<5} • (logV^) • loglog V<5^ • 

• Lower Bound. For every d < a < b such that a + logg4 < b, and 5 G (0, i/s], any polynomial 
p{x) that approximates e^^ uniformly over the interval [a,b\ up to an error of 5 ■ e^", must have 
degree at least ^ ■ \/b — a . 

Organization 

We first discuss a few preliminaries (Section 17. 1|) and discuss relevant results that were already 
known and compare our result to the existing lower bounds (Section l7.2|l . Finally, we give a proof 
of the upper bound in Theorem II. 51 in Section [731 and of the lower bound in Section [7!4l Readers 
familiar with standard results in approximation theory can skip directly to the proofs in Section [731 
andm 

7.1 Preliminaries 

Given an interval \a,b\, we are looking for low-degree polynomials (or rational functions) that 
approximate the function e^^ in the sup norm over the interval. 

Definition 7.2 (^5-Approximation) A function g is called a S -approximation to a function f over an 
interval I, if sup^^j ~ Si^) I ^ 

Such approximations are known as uniform approximations in approximation theory and have been 
studied quite extensively. We will consider both finite and infinite intervals I. 
For any positive integer k, let Sjt denote the set of all degree k polynomials. We also need to define 
the ii norm of a polynomial. 

Definition 7.3 (£i Norm of a Polynomial) Given a degree k polynomial p = Ya^q ai ■ x', the li norm 
ofp, denoted as \\p\\i is defined as \\p\\i = Ya>o 
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7.2 Known Approximation Results and Discussion. 

Approximating the exponential is a classic question in Approximation Theory, see e.g. [101 ■ We 
ask the following question: 

Question: Given S < 1 and a < b, what is the smallest degree of a polynomial that is an 5 ■ e^"- 
approximation to e^^ over the interval [a, b] ? 

This qustion has been studied in the following form: Given A, what is the best low degree 
polynomial (or rational function) approximation to e^^ over [—1,1]? In a sense, these questions 
are equivalent, as is shown by the following lemma, proved using a linear shift of variables. A 
proof is included in Section [751 

Lemma 7.4 (Linear Variable Shift for Approximation) For any non-negative integer k and I and real 
numbers b > a, 



mm sup 



Pk{t) 



— 

e • min sup 



^Ih^x Pk{x) 



Using the above lemma, we can translate the known results to our setting. As a starting point, 
we could approximate e^^ by truncating the Taylor series expansion of the exponential. We state 
the approximation achieved in the following lemma. A proof is included in Section [731 

Lemma 7.5 (Taylor Approximation) The degree k polynomial obtained by truncating Taylor's expan- 
sion of e^'^ around the point b+a/2 is a uniform approximation to e^* on the interval [a,b] up to an error 

of 

_ b+a ^ 1 f b — a 

■ ^ 

i=k+l V ^ 

which is smaller than 5 ■ e^^ for k > max{^-^^^,log lA} 

A lower boimd is known in the case where the size of the interval is fixed, i.e., b — a = 0{1). 

Proposition 7.6 (Lower Bound for Polynomials over Fixed Interval, (3l[3l)) For any a, b G R such 
that b — ais fixed, as k goes to infinity, the best approximation achieved by a degree k polynomial has error 

In essence, this theorem states that if the size of the interval is fixed, the polynomials obtained by 
truncating the Taylor series expansion achieve asymptotically the least error possible and hence, 
the best asymptotic degree for achieving a S • e^T^ -approximation. In addition, Saff |[3Tli also 
shows that if, instead of polynomials, we allow rational functions where the degree of the denom- 
inator is a constant, the degree required for achieving a S ■ e^T^ -approximation changes at most 
by a constant. 

These results indicate that tight boimds on the answer to our question should be already 
known. In fact, at first thought, the optimality of the Taylor series polynomials seems to be in 
contradiction with our results. However, note the two important differences: 

1. The error in our theorem is e^" • 5, whereas, the Taylor series approximation involves error 
.5, which is smaller, and hence requires larger degree. 
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2. Moreover, the lower bound applies only when the length of the interval {b — a) is constant, 
in which case, our theorem says that the required degree is poly (log i/<5), which is n(log ^/s), 
in accordance with the lower bound. 

If the length of the interval [a, b] grows unbounded (as is the case for our applications to the 
Balanced Separator problem in the previous sections), the main advantage of using polynomials 
from Theorem 17. II is the improvement in the degree from linear in (b — a) to \/b — a. 



7.3 Proof of Upper Bound in Theorem 11.51 

In this section, we use Theorem 16.81 by Saff, Schonhage and Varga 1130 1 , rather, more specifically. 
Corollary 16.91 to give a proof of the upper bound result in Theorem 11.51 We restate Corollary 16.91 
for completeness. 

Corollary 7.7 (Corollary 16.91 Restated, Il30ll ) There exists constants Ci > 1 and ko such that, for any 
integer k > ko, there exists a polynomial p^(x) of degree k such that p^(0) = 0, and. 



sup 




= sup 


fG(0,l] 




xe [o,oo) 



P^((l + V^) 



-1 



<Cik-2 



-k 



(18) 



Our approach is to compose the polynomial p^ given by Corollary 17.71 with polynomials ap- 
proximating (1 + V*:)^^ , to construct polynomials approximating e^^. We first show the existence 
of polynomials approximating x^^, and from these polynomials, we will derive approximations 
to (1 + V^)-^. 

Our goal is to find a polynomial q of degree k, that minimizes sup^^ ^ \q{x) — i/.t| . We slightly 
modify this optimization to minimizing sup^^^^^ \x ■ q{x) — Ij. Note that x ■ q{x) — 1 is a poly- 
nomial of degree k + 1 which evaluates to — 1 at x = 0, and conversely every polynomial that 
evaluates to —1 at can be written as x ■ q{x) — 1 for some q. So, this is equivalent to mini- 
mizing sup^gj^^j |(ji(x)|, for a degree k + 1 polynomial qi such that qi{0) = —1. By scaling and 
multiplying by —1, this is equivalent to finding a polynomial q2, that maximizes q2{0), subject to 
sup^gj^ \q2{x) I < 1. If we shift and scale the interval [a, b] to [—1, 1], the optimal solution to this 
problem is known to be given by the well known Chebyshev polynomials. We put all these ideas 
together to prove the following lemma. A complete proof is included in Section [7!5l 

Lemma 7.8 (Approximating x^^) For every e > 0, b > a > 0, there exists a polynomial qa,h,£{^) of 



degree 



log 



such that sup^gj^ |x • qa,h,e{x) 



11 < £. 



As a simple corollary, we can approximate (1 + V'c) ^, or rather generally, (1 
1/ > 0, by polynomials. A proof is included in Section [7!5l 



vx) ^ for some 



Corollary 7.9 (Approximating (1 + t/x) ^) for every t/ > 0, £ > and b > a > 0, there exists a 
polynomial q^.^^^x) of degree [ log f j such that sup^^j^;,] |(1 + vx) ■ ql„^f,Ax) - 1| < e. 

The above corollary implies that the expression (1 + vx) ■ q* is within 1 ± £ on [a, b]. If £ is small, 
for a small positive integer t, [(1 + vx) ■ q*y should be at most 1 ± 0{te). The following lemma, 
proved using the binomial theorem proves this formally. A proof is included in Section [731 
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Lemma 7.10 (Approximating (1 + vx) For all real e>0,b>a>0 and positive integer t; ifte < 1, 
then, 

sup \{{l + vx)-q:^,^,^,{x)y-l\<2te, 

xe[a,b] 



where cj*^^^ is the polynomial given by Corollary \7.9\ 

Since q* is an approximation to (1 + V^)^\ in order to bound the error for the composition pKq*), 
we need to bound how the value of the polynomial changes on small perturbations in the input. 
We will use the following crude bound in terms of the £i norm of the polynomial. 

Lemma 7.11 (Error in Polynomial) For any polynomial p of degree k, and any x,y G R, \p{x) — 
P(y)l < l|pl!i-maxo<Kfc|x'-i/'|. 

In order to utilize the above lemma, we will need a bound on the li norm of p^, which is provided 
by Lemma [6.21 [ that bounds the £i norm of any polynomial in (1 + V*:)^^ that approximates the 
exponential function and has no constant term. We restate the lemma here for completeness. 

Lemma 7.12 (£i-norm Bound. Lemma [6.21| Restated) Given a polynomial p of degree k such that p(0) = 
and 



sup 




- Pit) 


= sup 


e-^-p[{l + ^/k)-') 


< 1, 








.re [0,00 ) 







we must have \\p\\i < {2k) 



k+l 



We can now analyze the error in approximating e ^ by the polynomial pl{q*) and give a proof 
for Theorem ll.51 

Proof: Given S < 1, let = max{A;o, logj 4ci/<5 + 2 logj log2 s} = O (log 1/ s) , where ko, ci are the 
constants given by Corollary 17.71 Moreover, p^ is the degree k polynomial given by Corollary 17.71 
which gives, p^(0) =0, and. 



sup 

-TG [0,oo) 



pl{{l+^/k)-' 



< 



5 log2 4ciA + 21og2log2 4ci/^ 



< 



4 

5 



4 log2 



1 + 2 



log2 4ci/<)- 



<M-3<^, (19) 
-42 - 2 ^ ' 



where the last inequality uses 5 <\ < Ci and log2 x < x, Vx > 0. Thus, we can use Lemma [7. 121 to 
conclude that < (2^)*^+^. 

Let V = Vk. Define £ as e = 2(2^- Let Pa,b,s{^) = e"" • Vl (fll,o,h-aM ~ ' ^^^^^ fv,Q,h-a,e is 
the polynomial of degree y^l + v{h — a) log | given by Corollary 17.91 Observe that Pa,b,s{x) is a 

polynomial of degree that is the product of the degrees of and q ^, i.e., k a/1 + v{b — a) log | 
Also note that ke < 1 and hence we can use Lemma 17.101 We show that Pa,b,3 is a uniform S- 
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approximation toe ^ on the interval [a, b] . 



sup \e - Pa,bA^)\ = ^ " • sup 

xG[a,b] xG[a,b] 



e-i--^)-e^.p^^^^^(x) =e-'. sup \e-y - ■ p.^^^y + a)\ 

yG[0,b-«] 



sup \e-y - pt {qio,b-a,e{y)) 



yG[0,h-a] 



A— ineq 



sup ( e-y-pt{{l + vy)-' 

yG[0,b- 



+ 



<e-'- sup e-y-p*J{l + vy)-A 

yG[0,fo-«] 



e ■ sup 

ye[0,fa-«] 



p^((l + vy)-i) -p^ «0>-«,.(y)) 



< e " • sup 

ye[0,oo) 



e-y-pM(l + vy) 



e " ■ IIp^II;! • rnax sup 

0<!<fcj/e[o,f,_„] 



(1 + vy) «o,fo-«,^(y))' 



< e "■- + € "■WplW.-max sup (1 + i/y) ' i-((i + i/y)-<o>-«,.(y)) 

Lem. U.WU. 12\ S . ^ 



The degree of the polynomial Pa,b,s is 

2 



1 + — a) log ■ 



O (^^Jk^ + k{b-a)■ (fcloglc + logV^)) 
O (^^max{log2 iAlogiA(fc - a)} ■ (log V^) ■ log log 



7.4 Proof of Lower Bound in Theorem 11.51 

In this section, we will use the following well known theorem of Markov from approximation 
theory to give a proof of the lower bound result in Theorem ll.51 

Theorem 7.13 (Markov, See |10|) Let p : IR ^ IRbe a univariate polynomial of degree d such that any 
real number a\ < x < ax, satisfies bi < p{x) < bx- Then, for all ai < x < az, the derivative ofp satisfies 

|p'WI<^'-|^- 

The idea is to first use uniform approximation bound to bound the value of the polynomial 
within the interval of approximation. Next, we use the approximation bound and the Mean Value 
theorem to show that there must exist a point t in the interval where \p\t) \ is large. We plug both 
these bounds into Markov's theorem to deduce our lower bound. 

Proof: Suppose p is a degree k polynomial that is a uniform approximation to e^^ over the 
interval \a,b] up to an error of 5 ■ e^". For any x G [a,b], this bounds the values p can take at 
X. Since p is a uniform approximation to e^^ over [a,b] up to an error of 3 ■ e^", we know that 
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for all X G [a,b], e ^ — 5 ■ e " < p{x) < e ^ + S ■ e Thus, max^^gj^ j,] p{x) < e " + 3 ■ e " and 
min^e[«,b]P(x) > e-^-S-e-". 

1 > n -i- loc ^ , „ , 

,g , .. ^ .„iow that there e.-„.„ . ^ , 



Assume that 6 < i/s, and b > a + logg4 > a + logg2/(i-4<5). Applying the Mean Value theorem 
on the interval [a, a + log^ 2/ (i-4J)], we know that there exists f G [fl, fl + log^ 2/(1-4(5)], such that. 



+log^2/(i_4^)) _ p(fl) 



log^2/(i_4^) 



> 



{e-" -S-e-")- (e-«-iog. 2/1-4^ + ^ . 



> e-" 



1-2S 



log^,2/(i_4^) 



log//(l-4^) 21og//(l-4^) 

We plug this in Markov's theorem (Theorem l7.13|) stated above to deduce, 
1 



21og^2/(i_4^) 

Rearranging, we get. 



b — a 



b — a 



k > 



b — a 



> 



b — a 1 I- 

> - ■ V D — fl. 



2 ■ (1 + 2^) ■ log, 2/(1^4^) - y 2 ■ 5/4 ■ log, 4-2 
where the second inequality uses 5 < i/s. 



7.5 Remaining Proofs 

Lemma 7.14 (Lemma 17.41 Restated) For any non-negative integer k and I and real numbers b > a, 

. t Pk{t) 



mm sup 



_ b+a 

e 2 • min sup 



Pk{x) 



Proof: Using the substitution t — 

-t Pk{t) 



del {b+a) _ jb-a) 
2 



mm sup 



mm sup 



_{^+f)_^_{b_^^ Pic ((''+'')/2 — (^-«)/2x) 



qi {(b+a)/2- {b-a)/2x) 



_ b+a 

e 2 ■ min sup 



Cl'li^) 



Lemma 7.15 (Lemma [7l5] Restated) The degree k polynomial obtained by truncating Taylor's expansion 
ofe^^ around the point b+a/2 is a uniform approximation to e^* on the interval [a, b] up to an error of 



_ b+a 

e 2 



which is smaller than 3 for k > max{^-^^-2^,logV^} 



i=k+l 



lf{b-a)\' 
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Proof: Let qk{t) be the degree k Taylor approximation of the function e around the point (b+'')/2, 



sup \e — i?;c(OI = sup e 



_ b+a 



te[a,b] 



_b+a ^ 1 / (b — a) 
e 2 . \ _ I ^ ' 



Using the inequality i\ > (^) ' , for all i, and assuming k > we get. 



iWc+l V ^ / /=A:+1 



22 



i CO 
i=k+l 



e-1 



which is smaller than 3 for k > log 1/(5. 



Lemma 7.16 (Lemma 17.81 Restated) For every e > 0, h > a > 0, there exists a polynomial cja,b,£{^) of 



degree 



-log^ 



sup \x ■ qa,h^i,{x) — 1| < e. 

xG[a,b] 



Proof: If T;f_|_i (x) denotes the degree /c + 1 Chebyshev polynomial, consider the function. 



qn,bA^) = - 1 



b+a-2x 



k+1 \ b-a 

T ( b+a 
n+1 [h^ 



First, we need to prove that the above expression is a polynomial. Clearly 1 / '!+!! ^ is a 

polynomial and evaluates to at x = 0. Thus, it must have x as a factor. Thus qa,b,e is a polynomial 
of degree k. Let k = b/a and note that k > 1. Thus, 



sup |x ■ qa,bA^) - 1| = sup 

xe[a,b] 



xG[a,b] 



T / b+a—2x 



b — a 



< 2 



k+1 



(Since|Tfc+i(y)| < 1 for |y| < 1) 



(By def.) 



(Each term is positive since a/k > 1) 



< 2 ■ (1 - yv^f^^ < 2 ■ 6-"-'+''/^^ 



for = [ i/Kio§ f 1 ■ The first inequality follows from the fact that |rj.^i(x) | < 1 for all |x| < 1. 
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Corollary 7.17 (Corollary 17.91 Restated) For every i/ > 0, e > and b > a > 0, there exists a polyno- 

such that 



^^^^ fv,a,hA^) of degree J ^ log | 



sup <e. 

xG[a,b] 



Proof: Consider the polyriomial q*^„^h^,{x) = qi+va,\+vh,e (1 + v^) , where q\+va,i+vh,t is given by 
the previous lemma. 

sup \{1+VX) ■ ql^a,h,e{^) " l| = ^Up \{l+Vx)- qi+ya,l+vh,e {l+Vx) - 1| 
xe[a,b] xe[a,b] 

Lent. 

sup \t-qi+va,l+vh,e{t)-M < £• 
tG[l+va,l+vb] 



.def-, 

t — l+vx 



Since 1 + i/z is a linear transformation, the degree ofq*^^^is the same as that of qi+va,i+vb,£ / which 
is. 



l+va £ 



Lemma 7.18 (Lemma [TiTO] Restated) For all real £>0, b>a>0 and positive integer t; if te < 1, 
then, 

sup \{{l+vx)-ql,^^^,{x)y-l\<2t£, 

xG[a,b] 

where ql^^^, is the polynomial given by Corollary \7.9\ 

Proof: We write the expression (1 + vx) ■ q^^be^-^) ^ plus an error term and then use the 
Binomial Theorem to expand the power. 

sup \{{l + vx)-ql„^^^^{x)y -1\= sup \{l-[l-{l + vx)-ql„^^^^{x)]y -1 

xG[a,b] xG[a,b] 



sup 

xe[a,b] 



xe[a,b] i=l 



^ sup X] . |1 - (1 + vx) ■ ql,^bA^)\' 



^ E J sup 1 1 - (1 + vx) ■ ql,^^,^,{x) I ' 

!=1 V/ xG[a,b] 
Cor.^t /A . 

< EQ^' = (i + ^)'-i 

< exp(fe) - 1 < 1 + + {tef - 1 < 2te, 
where the second last inequality uses e^ < 1 + x + x^ for x G [0, 1] . ^ 



^Forxe [0,l],e-' = E/>o If = l + ^ + 31 + ■■•) <'i- + x + x^(^ + § + § + . ..)<l + x + 



57 



Lemma 7.19 (Lemma l7.11I Restated) For any polynomial p of degree k, and any x, y G M 

\V{^)-P{y) \ < IIpIIi- max \x'-y'\ 

0<i<k 

Proof: Suppose p(t) is the polynomial Yl^^o ' where fl, G R. Then, 



Ip(^) -p(y)l 





k 










< 









k 

max I 

0<K*: 



!=0 



1 • max \x' — v'l 

^ 0<!<fc 
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